How do I count the number of occurrences of a string in a column in each file, and output the result filename and count + awk

Question

I have these 2 files, :

>cat file.csv
col1,col2,col3,col4
col1,col2,col3
col1,col2
col1
col1,col2,col3,col4,col5


> cat fild.csv
col1,col2,col3,col4
col1,col2,col3
col1,col2
col1
col1,col2,col3,col4,col5

how do I get this output:(basically count the number of occurrences of a string (e.g. "col1")in a column in each file )

file.csv,5
fild.csv,5

Below, my attempts, for my reference:

Output column/field1

> awk -F, '$1 =="col1" {print $1}' file.csv
col1
col1
col1
col1
col1

Output filenamecolumn/field1, how do I add a comma as separator

> awk -F, '$1 =="col1" {print FILENAME $1}' file.csv
file.csvcol1
file.csvcol1
file.csvcol1
file.csvcol1
file.csvcol1

output Id like

file.csv,5

attempt working on 2 files:

> awk -F, '$1 =="col1" {print FILENAME $1}' fil*.csv
fild.csvcol1
fild.csvcol1
fild.csvcol1
fild.csvcol1
fild.csvcol1
file.csvcol1
file.csvcol1
file.csvcol1
file.csvcol1
file.csvcol1

But the output i'd like is this:

file.csv,5
fild.csv,5

Answer

this works for me:

awk 'BEGIN{FS=OFS=","} $1 == "col1" {cnt++} ENDFILE{print FILENAME, (cnt>0&&cnt?cnt:"0"); cnt=0}' fil*.csv
fild.csv,5
file1.csv,0
file.csv,5

jared_mamrot · Accepted Answer

If you're using GNU awk another potential solution is to use the ENDFILE special pattern, e.g. using @markp-fuso's example data:

cat filb.csv            # empty
cat filc.csv
col1

cat fild.csv
col1,col2,col3,col4
col1,col2,col3
col1,col2
col1
col1,col2,col3,col4,col5

cat file.csv
col1,col2,col3,col4
col1,col2,col3
col1,col2
col1
col1,col2,col3,col4,col5


awk 'BEGIN{FS=OFS=","} $1 == "col1" {cnt++} ENDFILE{print FILENAME, (cnt>0&&cnt?cnt:"0"); cnt=0}' fil*.csv
filb.csv,0
filc.csv,1
fild.csv,5
file.csv,5

# the 'cnt>0&&cnt?cnt:"0"' is to handle empty files
# basically, if there are lines print cnt otherwise, if
# there are no lines print "0"

Edit

As commented by @EdMorten, cnt+0 can be used instead of cnt>0&&cnt?cnt:"0" to handle empty files (much easier to remember!), e.g.

awk 'BEGIN{FS=OFS=","} $1 == "col1" {cnt++} ENDFILE{print FILENAME, cnt+0; cnt=0}' fil*.csv
filb.csv,0
filc.csv,1
fild.csv,5
file.csv,5

How do I count the number of occurrences of a string in a column in each file, and output the result filename and count + awk

Below, my attempts, for my reference:

Answer

Answers (2)

Edit

Related Questions