Reputation: 27
I have two csv files
File A
ID
1
2
3
File B
ID
1
1
1
1
3
2
3
What I want to do is to count how many times that a ID in File A show up in File B, and save the result in a new file C (which is in csv format). For example, 1 in File A shows up 4 times in File B. So in the new file C, I should have something like
File C
ID,Count
1,4
2,1
3,2
Originally I was thinking use "grep -f", but it seems like it only works with .txt format. Unfortunately, File A and B are both in csv format. So now, I am thinking maybe I could use a for loop to get the ID from File A individually and use grep -c to count each one of them. Any idea will be helpful.
Thanks in advance!
Upvotes: 1
Views: 72
Reputation: 37454
You could use join
, sort
, uniq
and process substitution <(command)
creatively:
$ join -2 2 <(sort A) <(sort B | uniq -c) | sort -n > C
$ cat C
ID 1
1 4
2 1
3 2
And if you really really want the header to be ID Count
, before writing to file C
you could replace that 1
with Count
with sed
by adding:
... | sed 's/\(ID \)1/\1Count/' > C
to get
ID Count
1 4
2 1
3 2
and if you really really want commas as separators instead of spaces, to replace them with spaces using tr
, add also:
... | tr \ , > C
to get
ID,Count
1,4
2,1
3,2
You could of course ditch the tr
and use the sed
like this instead:
... | sed 's/\(ID \)1/\1Count/;s/ /,/' > C
And the output would be like above.
Upvotes: 0
Reputation: 785711
You can use this awk command:
awk -v OFS=, 'FNR==1{next} FNR==NR{a[$1]; next} $1 in a{freq[$1]++}
END{print "ID", "Count"; for (i in freq) print i, freq[i]}' fileA fileB
ID,Count
1,4
2,1
3,2
Upvotes: 1