Reputation: 1220
I have a CSV
file with first column & second column as ID
,domain
.
#Input.txt
1,google.com
1,cnn.com
1,dropbox.com
2,bbc.com
3,twitter.com
3,hello.com
3,example.com
4,twitter.com
.............
Now, I would like to get the count of IDs
. Yes,this can be done in Excel/sheets but the file contains of about 1.5Million lines.
Expected Output:
1,3
2,1
3,3
4,1
I tried using cat Input.txt | grep -c 1
and that which gives me count of '1' as 3
but I would like to do it for individual ID
count all at once. Can any one help me on how to achieve this ?
Upvotes: 2
Views: 101
Reputation: 229088
Here's one way, though the count is present in the 1. column:
$ zcat Input.txt.gz | cut -d , -f 1 | sort | uniq -c
3 1
1 2
3 3
1 4
Here's another way using awk:
$ awk -F , '{counter[$1]++};
END {for (id in counter) printf "%s,%d\n",id,counter[id];}' Input.txt |
sort
1,3
2,1
3,3
4,1
Upvotes: 2
Reputation: 33317
Here is a pure awk solution. It doesn't map the entire file in memory, so it will probably use less memory that @Joda's answer, but it assumes that the file is sorted:
awk -F, -v OFS=, '$1==prev{c++;next}{print prev,c; c=1}{prev=$1}END{print prev,c}' file
Upvotes: 1
Reputation: 783
awk -F "," '{ ids[$1]++} END { for(id in ids) { print id, ids[id] } }' input
And input is the file with the data.
output:
1 3
2 1
3 3
4 1
Edit:// If you want a comma seperated output you need to set the output seperator like this:
awk -F "," 'BEGIN { OFS=","} { ids[$1]++} END { for(id in ids) { print id, ids[id] } }' input
output:
1,3
2,1
3,3
4,1
Upvotes: 2
Reputation: 10865
$ awk -F, '{ print $1 }' input.txt | uniq -c | awk '{ print $2 "," $1 }'
1,3
2,1
3,3
4,1
Upvotes: 1
Reputation: 12383
This will do the job in bash
:
$ for i in {1..4}; do echo -n $i, >> OUTPUT && grep -c $i Input.txt >> OUTPUT; done
$ less OUTPUT
1,3
2,1
3,3
4,1
Upvotes: 1