Kyle Gong
Kyle Gong

Reputation: 27

count patterns in a csv file from another csv file in bash

I have two csv files

File A

ID
1
2
3

File B

ID
1
1
1
1
3
2
3

What I want to do is to count how many times that a ID in File A show up in File B, and save the result in a new file C (which is in csv format). For example, 1 in File A shows up 4 times in File B. So in the new file C, I should have something like

File C
ID,Count
1,4
2,1
3,2

Originally I was thinking use "grep -f", but it seems like it only works with .txt format. Unfortunately, File A and B are both in csv format. So now, I am thinking maybe I could use a for loop to get the ID from File A individually and use grep -c to count each one of them. Any idea will be helpful.

Thanks in advance!

Upvotes: 1

Views: 72

Answers (2)

James Brown
James Brown

Reputation: 37454

You could use join, sort, uniq and process substitution <(command) creatively:

$ join -2 2 <(sort A) <(sort B | uniq -c) | sort -n > C
$ cat C
ID 1
1 4
2 1
3 2

And if you really really want the header to be ID Count, before writing to file C you could replace that 1 with Count with sed by adding:

... | sed 's/\(ID \)1/\1Count/' > C

to get

ID Count
1 4
2 1
3 2

and if you really really want commas as separators instead of spaces, to replace them with spaces using tr, add also:

... | tr \  , > C

to get

ID,Count
1,4
2,1
3,2

You could of course ditch the trand use the sed like this instead:

... | sed 's/\(ID \)1/\1Count/;s/ /,/' > C

And the output would be like above.

Upvotes: 0

anubhava
anubhava

Reputation: 785711

You can use this awk command:

awk -v OFS=, 'FNR==1{next} FNR==NR{a[$1]; next} $1 in a{freq[$1]++}
              END{print "ID", "Count"; for (i in freq) print i, freq[i]}' fileA fileB

ID,Count
1,4
2,1
3,2

Upvotes: 1

Related Questions