discipulus
discipulus

Reputation: 2715

Count the occurance of items in a file comparing it in another file?

I have two single column files. File 1 looks like

red
green
blue 
red
red
green
black

There can be repeated samples in file 1 but items are not repeated in file 2. Similarly, file 2 has the list

red
green

I want to count how many times items that are there in file 2 appear in file 1. For example, I want to count the occurrence of red and green in file 1. The answer here is 5 (3 reds two greens) I can start with

<file1 cut -d' ' -f1 | uniq -c 

to count occurrence of each item and then use list in the second item to match one by one but that will be inefficient for my file size which has million rows.

Upvotes: 0

Views: 102

Answers (3)

user3065349
user3065349

Reputation: 171

I have the same basic idea as fedorqui and Jotne, but I only use one array. For very large key files, this may be better, so I've included it for completeness (I also wrote it as a file, rather than a one-liner):

#!/usr/bin/awk -f 

FNR==NR{
    # this is the key file: 
    KEYS[$1]=0;
    next ;
}

{
    if($1 in KEYS){
    KEYS[$1]++;
    }
}

END{
    for(i in KEYS){
    print i "  " KEYS[i]
    }
}




[count_occurrences.awk $] ./co.awk f2.dat f1.dat
red  3
green  2
[count_occurrences.awk $]

Upvotes: 2

Jotne
Jotne

Reputation: 41456

Another awk variation

awk 'NR==FNR {a[$1]++;next} a[$1] {b[$1]++} END {for (i in b) print i,b[i]}' file2 file1
red 3
green 2

Upvotes: 2

fedorqui
fedorqui

Reputation: 289835

If you want to count how many overall, do:

$ grep -cf file2 file1
5

grep -c stands for count and -f "Obtain patterns from FILE, one per line".

Step by step:

$ grep -f file2 file1
red
green
red
red
green

$ grep -cf file2 file1
5

If you want to get how many of them you got, do:

$ awk 'NR==FNR {a[$1]=$1; next} {if (a[$1]) b[$1]++} END {for (i in b) print i, b[i]}' file2 file1
green 2
red 3
  • NR==FNR {a[$1]=$1; next} gets the info from the first file, fetching the possible values.
  • {if (a[$1]) b[$1]++} in the second file, if the first column is in the array of possible values, increase a array counter b[].
  • END {for (i in b) print i, b[i]} print results.

Upvotes: 2

Related Questions