Kamilė Vainiūtė
Kamilė Vainiūtė

Reputation: 105

using sort and uniq -c in awk from second line of data

There are many questions similar to this, but they all (that I have seen) are interested in sorting and counting repeated strings from the first line of data. In my case I need to leave the first line intact and on top, while calling sort | uniq -c for all the following lines. I have gotten to the sorting part, the only part I'm stuck with is the uniq -c one. I've tried pipes, calling system("uniq -c"), other combinations of system(...), but nothing seems to work. My current line of commands looks like this, but it only gets to the sorting part:

myProgram input_file other_input_file | awk 'NR<2{print $0;next}{print $0 | "sort"}'

and from this I get:

Id: revision_data  #this needs to stay on top
0
0
10.1007/S00253-012-4050-Z
10.1007/S00775-006-0142-5
10.1021/ACS.BIOCHEM.5B00958
10.1021/BI020286F
10.1038/35422
10.1093/NAR/28.8.1743
10.1093/NAR/GKN245
10.7554/ELIFE.00813

while what I need is this:

Id: revision_data
   2 0
   1 10.1007/S00253-012-4050-Z
   1 10.1007/S00775-006-0142-5
   1 10.1021/ACS.BIOCHEM.5B00958
   1 10.1021/BI020286F
   1 10.1093/NAR/28.8.1743
   1 10.1093/NAR/GKN245
   1 10.7554/ELIFE.00813

how could I insert uniq -c to my commands to get the output that I need?

Upvotes: 0

Views: 333

Answers (3)

anubhava
anubhava

Reputation: 785128

You may use this gnu awk:

awk 'NR == 1 { print; next } { ++freq[$0] } END {
     PROCINFO["sorted_in"] = "@ind_str_asc"; for (i in freq) print freq[i], i }' file

Id: revision_data
2 0
1 10.1007/S00253-012-4050-Z
1 10.1007/S00775-006-0142-5
1 10.1021/ACS.BIOCHEM.5B00958
1 10.1021/BI020286F
1 10.1038/35422
1 10.1093/NAR/28.8.1743
1 10.1093/NAR/GKN245
1 10.7554/ELIFE.00813

Upvotes: 2

borrible
borrible

Reputation: 17356

You could concatenate the first line of the file with your command applied to the other lines.

For example:

cat <(head -n1 filename) <(sort <(tail -n+2 filename) | uniq -c)

This applies a sort | uniq -c to all lines starting with the second (via the tail -n+2). That's concatenated to the first line (via head -n1) using cat.

Upvotes: 1

KamilCuk
KamilCuk

Reputation: 140990

So just save the first line, output it and then continue with your script.

{
   IFS= read -r firstline
   printf "%s\n" "$firstline"
   sort | uniq -c
} < input_file

Upvotes: 1

Related Questions