UserYmY
UserYmY

Reputation: 8554

How to count number of rows per distinct row in Linux bash

I have a file like this:

id|domain
9930|googspf.biz
9930|googspf.biz
9930|googspf.biz
9931|googspf.biz
9931|googspf.biz
9931|googspf.biz
9931|googspf.biz
9931|googspf.biz
9942|googspf.biz

And I would like to count the number of times a distinct id shows up in my data like below:

9930|3
9931|5
9942|1

How can I do that with linux bash? Currently I am using this, but I am counting all lines with this:

cat filename | grep 'googspf.biz'| sort -t'|' -k1,1 | wc

can any body help?

Upvotes: 3

Views: 229

Answers (3)

glenn jackman
glenn jackman

Reputation: 246774

sed 1d file | cut -d'|' -f1 | sort | uniq -c

Upvotes: 3

Gilles Quénot
Gilles Quénot

Reputation: 185025

Try this :

awk -F'|' '
    /googspf.biz/{a[$1]++}
    END{for (i in a) {print i, a[i]}}
' OFS='|' file

or

awk '
    BEGIN {FS=OFS="|"}
    /googspf.biz/{a[$1]++}
    END{for (i in a) {print i, a[i]}}
' file

Upvotes: 3

fredtantini
fredtantini

Reputation: 16556

I first thought of using uniq -c (-c is for count) since your data seems to be sorted:

~$ grep "googspf.biz" f | cut -d'|' -f1|uniq -c
      3 9930
      5 9931
      1 9942

And in order to format accordingly, I had to use awk:

~$ grep "googspf.biz" f | cut -d'|' -f1|uniq -c|awk '{print $2"|"$1}'
9930|3
9931|5
9942|1

But then, with awk only:

~$ awk -F'|' '/googspf/{a[$1]++}END{for (i in a){print i"|"a[i]}}' f
9930|3
9931|5
9942|1

-F'|' to use | as a delimiter, and if line matches googspf (or NR>1: if line's number is >1) increments the counter for the first field. At the end print accordingly.

Upvotes: 1

Related Questions