Reputation: 2715
I have two single column files. File 1 looks like
red
green
blue
red
red
green
black
There can be repeated samples in file 1 but items are not repeated in file 2. Similarly, file 2 has the list
red
green
I want to count how many times items that are there in file 2 appear in file 1. For example, I want to count the occurrence of red and green in file 1. The answer here is 5 (3 reds two greens) I can start with
<file1 cut -d' ' -f1 | uniq -c
to count occurrence of each item and then use list in the second item to match one by one but that will be inefficient for my file size which has million rows.
Upvotes: 0
Views: 102
Reputation: 171
I have the same basic idea as fedorqui and Jotne, but I only use one array. For very large key files, this may be better, so I've included it for completeness (I also wrote it as a file, rather than a one-liner):
#!/usr/bin/awk -f
FNR==NR{
# this is the key file:
KEYS[$1]=0;
next ;
}
{
if($1 in KEYS){
KEYS[$1]++;
}
}
END{
for(i in KEYS){
print i " " KEYS[i]
}
}
[count_occurrences.awk $] ./co.awk f2.dat f1.dat
red 3
green 2
[count_occurrences.awk $]
Upvotes: 2
Reputation: 41456
Another awk
variation
awk 'NR==FNR {a[$1]++;next} a[$1] {b[$1]++} END {for (i in b) print i,b[i]}' file2 file1
red 3
green 2
Upvotes: 2
Reputation: 289835
If you want to count how many overall, do:
$ grep -cf file2 file1
5
grep -c
stands for count
and -f
"Obtain patterns from FILE, one per line".
Step by step:
$ grep -f file2 file1
red
green
red
red
green
$ grep -cf file2 file1
5
If you want to get how many of them you got, do:
$ awk 'NR==FNR {a[$1]=$1; next} {if (a[$1]) b[$1]++} END {for (i in b) print i, b[i]}' file2 file1
green 2
red 3
NR==FNR {a[$1]=$1; next}
gets the info from the first file, fetching the possible values.{if (a[$1]) b[$1]++}
in the second file, if the first column is in the array of possible values, increase a array counter b[]
.END {for (i in b) print i, b[i]}
print results.Upvotes: 2