Reputation: 4643
How do I count the number of occurrences of a string in a column in each file, and output the result filename and count + awk
I have these 2 files, :
>cat file.csv
col1,col2,col3,col4
col1,col2,col3
col1,col2
col1
col1,col2,col3,col4,col5
> cat fild.csv
col1,col2,col3,col4
col1,col2,col3
col1,col2
col1
col1,col2,col3,col4,col5
how do I get this output:(basically count the number of occurrences of a string (e.g. "col1")in a column in each file )
file.csv,5
fild.csv,5
Output column/field1
> awk -F, '$1 =="col1" {print $1}' file.csv
col1
col1
col1
col1
col1
Output filenamecolumn/field1, how do I add a comma as separator
> awk -F, '$1 =="col1" {print FILENAME $1}' file.csv
file.csvcol1
file.csvcol1
file.csvcol1
file.csvcol1
file.csvcol1
output Id like
file.csv,5
attempt working on 2 files:
> awk -F, '$1 =="col1" {print FILENAME $1}' fil*.csv
fild.csvcol1
fild.csvcol1
fild.csvcol1
fild.csvcol1
fild.csvcol1
file.csvcol1
file.csvcol1
file.csvcol1
file.csvcol1
file.csvcol1
But the output i'd like is this:
file.csv,5
fild.csv,5
this works for me:
awk 'BEGIN{FS=OFS=","} $1 == "col1" {cnt++} ENDFILE{print FILENAME, (cnt>0&&cnt?cnt:"0"); cnt=0}' fil*.csv
fild.csv,5
file1.csv,0
file.csv,5
Upvotes: 1
Views: 98
Reputation: 26505
If you're using GNU awk another potential solution is to use the ENDFILE special pattern, e.g. using @markp-fuso's example data:
cat filb.csv # empty
cat filc.csv
col1
cat fild.csv
col1,col2,col3,col4
col1,col2,col3
col1,col2
col1
col1,col2,col3,col4,col5
cat file.csv
col1,col2,col3,col4
col1,col2,col3
col1,col2
col1
col1,col2,col3,col4,col5
awk 'BEGIN{FS=OFS=","} $1 == "col1" {cnt++} ENDFILE{print FILENAME, (cnt>0&&cnt?cnt:"0"); cnt=0}' fil*.csv
filb.csv,0
filc.csv,1
fild.csv,5
file.csv,5
# the 'cnt>0&&cnt?cnt:"0"' is to handle empty files
# basically, if there are lines print cnt otherwise, if
# there are no lines print "0"
As commented by @EdMorten, cnt+0
can be used instead of cnt>0&&cnt?cnt:"0"
to handle empty files (much easier to remember!), e.g.
awk 'BEGIN{FS=OFS=","} $1 == "col1" {cnt++} ENDFILE{print FILENAME, cnt+0; cnt=0}' fil*.csv
filb.csv,0
filc.csv,1
fild.csv,5
file.csv,5
Upvotes: 4
Reputation: 34184
Adding a couple more files to the mix:
$ cat filb.csv # empty
$ cat filc.csv
col2
One awk
approach:
awk -v str='col1' ' # pass in string to search for
BEGIN { FS=OFS=","
for (i=1;i<ARGC;i++)
count[ARGV[i]]=0 # initialize counter for all files; addresses issue where a file may not have any matches or file is empty (ie, count==0)
}
{ for (i=1;i<=NF;i++) # loop through fields looking for a match and ...
if ($i==str) # if found then ...
count[FILENAME]++ # increment our counter
}
END { for (fname in count)
print fname,count[fname]
}
' fil?.csv
This generates:
file.csv,5
filb.csv,0
fild.csv,5
filc.csv,0
NOTES:
$i==str
- assumes we're looking for an exact match on the value in the field (as opposed to a substring of the field's value)str
on any field/column in the file, otherwise we'll need to add an additional input variable to designate which column(s) to searchsort
, or add some code to allow awk
to sort the output before printing to stdoutAn alternative grep|tr
idea:
$ grep -oc "$str" fil?.csv | tr ':' ','
filb.csv,0
filc.csv,0
fild.csv,5
file.csv,5
Upvotes: 3