Tristan
Tristan

Reputation: 47

How to use Grep commands to find specific value in text file

I need to grep a file called daily_fails_count.csv but only find the number of failures. Inside that file is this, on a shorter scale:

January,1,0,0
January,1,1,0
January,1,2,0
January,1,3,0
January,1,4,0
January,1,5,0
January,1,6,0
January,1,7,0
January,1,8,0

It's format is "month,day,hours,failures." It goes through all months. The last value is the number of failures found at that time. I know here it all says 0 but that's because no failures were found there, other dates have failures.

I'm not very good with grep commands in Linux scripts, so my question is this, how do I grep to find just the last digit in the file?

I'm writing this script in a file called make_accum_fail_counts.sh and I will run it as such:

bash make_accum_fail_counts.sh daily_fail_counts.csv > accum_fail_counts.csv

So I'm using the daily_fail_counts.csv as the input for the new script. Here's my script so far:

#!/bin/bash

if [ $# == 1 ]
then
    logFile=$1
fi

cat $logFile > tmpFile

hour=0
failure=0

while [ $hour -le 23 ]
do
    if [ $hour -le 23 ]
    then
        failure=`grep "*,*,*,^[0-10]" tmpFile | wc -l`
    fi
    echo "$hour,$failure"
    hour=$((hour+1))
    failure=0
done
rm -rf tmpFile

I just need help with my grep command:

failure=`grep "*,*,*,^[0-10]" tmpFile | wc -l`

Just to find, among all the days, the failures from hour to hour. so it's output would be:

0,1000
1,1040
2,2888

Where there were 1000 failures between 0:00-1:00, 1040 failures between 1:00-2:00 and so on. Thanks in advance.

Upvotes: 2

Views: 1694

Answers (2)

user3064538
user3064538

Reputation:

cat yourfile.csv | cut -d',' -f 4 | paste -s -d+ - | bc

To sum all the failures. Use cut -d',' -f 4 yourfile.csv to split each line on the commas and get the 4th value, that'll give you a list of numbers, then use a shell command to sum a list of numbers.

You can grep to filter it down to the hour, something like

cat yourfile.csv | cut -d',' -f 3,4 | grep ^0, | cut -d',' -f 2

To get all the 0th hour failure counts.

for hour in {0..23}; do
    cat yourfile.csv | cut -d',' -f 3,4 | grep ^$hour, | cut -d',' -f 2 | paste -s -d+ - | bc
done

To get the totals for each hour.

If you want them grouped by day you can read about the date command, figure out how to get it to output strings like January,1, and and add an outer for loop to the above command that passes each line through a grep with the output of that date command.

Personally, at this point I would start writing Python instead of bash. The pandas library is better suited for this.

Upvotes: 1

RavinderSingh13
RavinderSingh13

Reputation: 133428

If I understood your question correctly, could you please try following. This will provide total of failures(last field/4th field) count as per hours values and irrespective of month.

awk '
BEGIN{
  FS=OFS=","
}
!b[$3]++{
  c[++count]=$3
}
{
  a[$3]+=$4
}
END{
  for(i=1;i<=count;i++){
    print c[i],a[c[i]]
  }
}
'  Input_file

1 more thing, this approach will provide output in same order in which $3 is coming in Input_file.

Explanation: Adding explanation for above code here.

awk '                          ##Starting awk program here.
BEGIN{                         ##Starting BEGIN section from here.
  FS=OFS=","                   ##Setting FS and OFS as comma here.
}                              ##Closing BLOCK for BEGIN section here.
!b[$3]++{                      ##Checking condition if $3 is NOT present in array b then do following + it is placing $3 in array b.
  c[++count]=$3                ##Creating an array named c whose index is variable count and value is $3, variable count value is keep increasing with 1.
}                              ##Closing BLOCK for array b condition here.
{
  a[$3]+=$4                    ##Creating an array named a with index $3 and value is $4 and its keep adding its value to its own same index value.
}
END{                           ##Starting END section of this program here.
  for(i=1;i<=count;i++){       ##Starting for loop from i=1 to till value of count variable here.
    print c[i],a[c[i]]         ##Printing array c value index variable i and printing array a value whose index is array c with index variable i.
  }                            ##Closing BLOCK for, for loop here.
}                              ##Closing BLOCK for END section of this program here.
'  Input_file                  ##Mentioning Input_file name here.

Upvotes: 1

Related Questions