Arun
Arun

Reputation: 1220

UNIX: Getting count occurance of numbers from a CSV file

I have a CSV file with first column & second column as ID,domain.

#Input.txt
1,google.com
1,cnn.com
1,dropbox.com
2,bbc.com
3,twitter.com
3,hello.com
3,example.com
4,twitter.com
.............

Now, I would like to get the count of IDs. Yes,this can be done in Excel/sheets but the file contains of about 1.5Million lines.

Expected Output: 
1,3
2,1
3,3
4,1

I tried using cat Input.txt | grep -c 1 and that which gives me count of '1' as 3 but I would like to do it for individual ID count all at once. Can any one help me on how to achieve this ?

Upvotes: 2

Views: 101

Answers (5)

nos
nos

Reputation: 229088

Here's one way, though the count is present in the 1. column:

$ zcat Input.txt.gz | cut -d , -f 1 | sort | uniq -c
      3 1
      1 2
      3 3
      1 4

Here's another way using awk:

$ awk -F , '{counter[$1]++}; 
  END {for (id in counter) printf "%s,%d\n",id,counter[id];}' Input.txt | 
  sort
1,3
2,1
3,3
4,1

Upvotes: 2

user000001
user000001

Reputation: 33317

Here is a pure awk solution. It doesn't map the entire file in memory, so it will probably use less memory that @Joda's answer, but it assumes that the file is sorted:

awk -F, -v OFS=, '$1==prev{c++;next}{print prev,c; c=1}{prev=$1}END{print prev,c}' file

Upvotes: 1

Daniel
Daniel

Reputation: 783

awk -F "," '{ ids[$1]++} END { for(id in ids) { print id, ids[id] } }' input

And input is the file with the data.

output:

1 3
2 1
3 3
4 1

Edit:// If you want a comma seperated output you need to set the output seperator like this:

awk -F "," 'BEGIN { OFS=","} { ids[$1]++} END { for(id in ids) { print id, ids[id] } }' input

output:

1,3
2,1
3,3
4,1

Upvotes: 2

jas
jas

Reputation: 10865

$ awk -F, '{ print $1 }' input.txt | uniq -c | awk '{ print $2 "," $1 }'
1,3
2,1
3,3
4,1

Upvotes: 1

Arkadiusz Drabczyk
Arkadiusz Drabczyk

Reputation: 12383

This will do the job in bash:

$ for i in {1..4}; do echo -n $i, >> OUTPUT && grep -c $i Input.txt >> OUTPUT; done
$ less OUTPUT
1,3
2,1
3,3
4,1

Upvotes: 1

Related Questions