rs232
rs232

Reputation: 19

Order lines by number of occurrences

Given a list with one element per line (with occasionally some blank lines), e.g.:

22008
6881
6881
22008

6881
22008
22008
6881

56515
8080
8080
56515

22008
45682
45682
22008

I would like to get as output a list with unique items sorted by number of occurrences:

22008 - 6
6881 - 4
8080 - 2
45682 - 2
56515 - 2

Thanks!

Upvotes: 1

Views: 122

Answers (3)

Vidhya G
Vidhya G

Reputation: 2320

You can use awk and sort. cnt uses your numbers in column 1 $1 as an index. Adds ++ 1 to value of array index $1 on each row. Pipe (|) to sort. sort column 2 (-k2) in reverse (-r)

awk '/[0-9]/ {cnt[$1]++}END{for(k in cnt) print k,"- " cnt[k]}' file.txt |sort -rk2

if you remove the /[0-9]/ you'll also get the number of blank lines as a bonus :).

If you want, you can use /^[0-9]+/ to do a full match; but, as we use $0 for counting it doesn't really matter here.

Upvotes: 1

halfflat
halfflat

Reputation: 1584

The uniq command has an option -c to emit the number of consecutive occurrences it finds. The solution then is to first remove blank lines and sort the list for input to uniq -c, then sort the output on the first field, which contains the occurrences number.

Output of sed '/^\s*$/d' | sort | uniq -c | sort -k1nr is

   6 22008
   4 6881
   2 45682
   2 56515
   2 8080

Note the option to sort at the end: -k1nr means sort on the first field, numerically, in reverse (i.e. descending) order.

Upvotes: 1

John1024
John1024

Reputation: 113834

Numbers sorted by number of occurrences:

$ grep -vE '^$' file | sort | uniq -c | sort -rn
      6 22008
      4 6881
      2 8080
      2 56515
      2 45682

How it works

  • grep -vE '^$' file

    Remove empty lines from file

  • sort | uniq -c

    Sort the numbers then print unique ones with a count of their occurrences.

  • sort -rn

    Sort numerically in declining order by occurrence count.

Upvotes: 2

Related Questions