Bipo K
Bipo K

Reputation: 393

How to grep for multiple word occurrences from multiple files and list them grouped as rows and columns

Hello: Need your help to count word occurrences from multiple files and output them as row and columns. I searched the site for a similar reference but could not locate, hence posting it here.

Setup: I have 2 files with the following

[a.log]
id,status
1,new
2,old
3,old
4,old
5,old

[b.log]
id,status
1,new
2,old
3,new
4,old
5,new

Results required The result i require using the command line only is (preferably):

file     count(new)    count(old)
a.log    1             4
b.log    3             2

Script The script below provides me the count for a single word across multiple. I am stuck trying to get results for multiple words. Please help.

grep -cw "old" *.log

Upvotes: 0

Views: 722

Answers (5)

RomanPerekhrest
RomanPerekhrest

Reputation: 92854

Awk solution:

awk 'BEGIN{ 
         FS=","; OFS="\t"; print "file","count(new)","count(old)";
         f1=ARGV[1]; f2=ARGV[2]     # get filenames
     }
     FNR==1{ next }                 # skip the 1st header line
     NR==FNR{ c1[$2]++; next }      # accumulate occurrences of the 2nd field in 1st file
     { c2[$2]++ }                   # accumulate occurrences of the 2nd field in 2nd file
     END{ 
         print f1, c1["new"], c1["old"];
         print f2, c2["new"], c2["old"] 
     }' a.log b.log

The output:

file    count(new)  count(old)
a.log   1   4
b.log   3   2

Upvotes: 0

karakfa
karakfa

Reputation: 67507

without real multi-dim array support, this will count all values in field 2, not just "new/old". The header and number of columns are dynamic with number of distinct values as well.

$ awk -F, 'NR==1 {fs["file"]} 
           FNR>1 {c[FILENAME,$2]++; fs[FILENAME]; ks[$2];
                  c["file",$2]="count("$2")"} 
           END   {for(f in fs) 
                    {printf "%s", f; 
                     for(k in ks) printf "%s", OFS c[f,k]; 
                     printf "\n"}}' file{1,2} | column -t


file   count(new)  count(old)
file1  1           4
file2  3           2

Upvotes: 0

user unknown
user unknown

Reputation: 36250

for c in a b ; do egrep -o "new|old" $c.log | sort | uniq -c > $c.luc; done 

Get rid of the headlines with grep, then sort and count.

join -1 2 -2 2 a.luc b.luc
> new 1 3
> old 4 2

Placing a new header is left as an exercise for the reader. Is there a flip command for unix/linux/bash to flip a table, or how would you say?

Handling empty cells is left as an exercise too, but possible with join.

Upvotes: 0

anubhava
anubhava

Reputation: 785541

You can get this output using gnu-awk that accepts comma separated word to be searched in a command line argument:

awk -v OFS='\t' -F, -v wrds='new,old' 'BEGIN{n=split(wrds, a, /,/); for(i=1; i<=n; i++) b[a[i]]=a[i]} FNR==1{next} $2 in b{freq[FILENAME][$2]++} END{printf "%s", "file" OFS; for(i=1; i<=n; i++) printf "count(%s)%s", a[i], (i==n?ORS:OFS); for(f in freq) {printf "%s", f OFS; for(i=1; i<=n; i++) printf "%s%s", freq[f][a[i]], (i==n?ORS:OFS)}}' a.log b.log | column -t

Output:

file   count(new)  count(old)
a.log  1           4
b.log  3           2

PS: column -t was only used for formatting the output in tabular format.

Readable awk:

awk -v OFS='\t' -F, -v wrds='new,old' 'BEGIN {
   n = split(wrds, a, /,/) # split input words list by comma with int index
   for(i=1; i<=n; i++)     # store words in another array with key as words
      b[a[i]]=a[i]
}
FNR==1 { 
   next # skip first row from all the files
}
$2 in b {
   freq[FILENAME][$2]++ # store filename and word frequency in 2-dimesional array
}
END { # print formatted result
   printf "%s", "file" OFS
   for(i=1; i<=n; i++)
      printf "count(%s)%s", a[i], (i==n?ORS:OFS)

   for(f in freq) {
      printf "%s", f OFS
      for(i=1; i<=n; i++)
         printf "%s%s", freq[f][a[i]], (i==n?ORS:OFS)
   }
}' a.log b.log

Upvotes: 1

HardcoreHenry
HardcoreHenry

Reputation: 6387

I think you're looking for something like this, but it's not overly clear what your objectives are (if you're going for efficiency for example, this isn't overly efficient)...

for file in *.log; do
    echo -n "${file}\t"
    for word in "new" "old"; do
        grep -cw $word $file;
        echo -n "\t";
    done
    echo;
done

(for readability, I simplified the first line, but this doesn't work if there's spaces in the filenames -- the proper solution is to change the first line to read find . -iname "*.log" -maxdepth=1 | while read file; do)

Upvotes: 0

Related Questions