ohlla
ohlla

Reputation: 25

sort file in order of the number of occurrence

There is a text file:

$ cat tempfile.txt
123
567
345
123
789
234
123
234
345
789

and my desired output is the following:

123,3
789,2
345,2

I need to sort each (1) in order of the number of occurrences, and (2) when the numbers of the occurrences are the same, the one with bigger numerical value should be ordered first, and (3) only top 3 are shown.

I tried this:

tr -c '[:alnum:]' '[\n*]' < tempfile.txt | sort -nr | uniq -c | sort -nr | head  -3

But this only shows like this: I need to swap the position of the number of occurrences and digits, and separate those two by comma, and when the number of the occurrence is the same sort the bigger numerical digits first.

3 123
2 234
2 345

Upvotes: 2

Views: 472

Answers (4)

dawg
dawg

Reputation: 103884

You can use regular awk and two field sort this way:

$ awk '{arr[$1]++} 
            END{for (e in arr) print e "," arr[e]}' file | 
            sort -t , -k2rn -k1rn 

Prints:

123,3
789,2
345,2
234,2
567,1

Again, use head to get the desired number of the values.

Upvotes: 1

Oliver Gaida
Oliver Gaida

Reputation: 1938

take this line :

sort tempfile.txt | uniq -c | sort -nr -k 1,1 -k 2,2 | awk '{print $2","$1; if (NR == 3) exit}'

it gives:

123,3
789,2
345,2

Upvotes: 3

Timur Shtatland
Timur Shtatland

Reputation: 12347

Use this command, which allows specific control of the sorting order:

sort -g tempfile.txt | uniq -c | perl -lane 'print join "\t", reverse @F;' | sort -k2,2gr -k1,1g | head -n3 | tr '\t' ','

Output:

123,3
234,2
345,2

Upvotes: 0

RavinderSingh13
RavinderSingh13

Reputation: 133545

Within single GNU awk, could you please try following.

awk '
{
  a[$0]++
}
END{
  PROCINFO["sorted_in"] = "@val_num_desc"
  for(i in a){
    c[a[i]]=(c[a[i]]>i?c[a[i]]:i)
  }
  PROCINFO["sorted_in"] = "@ind_num_desc"
  for(o in c){
    print o,c[o]
  }
}' Input_file

For provided samples output will be as follows.

3 123
2 789
1 567

Explanation: Adding detailed explanation for above.

awk '                                        ##Starting awk program from here.
{
  a[$0]++                                    ##Creating array a with index of current line and increasing its value with 1 here.
}
END{                                         ##Starting END block of this program from here.
  PROCINFO["sorted_in"] = "@val_num_desc"    ##Making gwk to sort from value in array gawk GREAT function :)
  for(i in a){                               ##Traversing through a here.
    c[a[i]]=(c[a[i]]>i?c[a[i]]:i)            ##Creating array c with index of value of a with condition if
                                             ##value of c(with index of a[i]) is greater than i then keep its value else assign i to it.
  }
  PROCINFO["sorted_in"] = "@ind_num_desc"    ##Making gawk to sort by index in array gawk another GREAT function :)
  for(o in c){                               ##Traversing through c here.
    print o,c[o]                             ##Printing index and value of c with index of o here.
  }
}' Input_file                                ##Mentioning Input_file name here.

Upvotes: 0

Related Questions