Reputation: 105
There are many questions similar to this, but they all (that I have seen) are interested in sorting and counting repeated strings from the first line of data. In my case I need to leave the first line intact and on top, while calling sort | uniq -c
for all the following lines. I have gotten to the sorting part, the only part I'm stuck with is the uniq -c
one. I've tried pipes, calling system("uniq -c")
, other combinations of system(...)
, but nothing seems to work. My current line of commands looks like this, but it only gets to the sorting part:
myProgram input_file other_input_file | awk 'NR<2{print $0;next}{print $0 | "sort"}'
and from this I get:
Id: revision_data #this needs to stay on top
0
0
10.1007/S00253-012-4050-Z
10.1007/S00775-006-0142-5
10.1021/ACS.BIOCHEM.5B00958
10.1021/BI020286F
10.1038/35422
10.1093/NAR/28.8.1743
10.1093/NAR/GKN245
10.7554/ELIFE.00813
while what I need is this:
Id: revision_data
2 0
1 10.1007/S00253-012-4050-Z
1 10.1007/S00775-006-0142-5
1 10.1021/ACS.BIOCHEM.5B00958
1 10.1021/BI020286F
1 10.1093/NAR/28.8.1743
1 10.1093/NAR/GKN245
1 10.7554/ELIFE.00813
how could I insert uniq -c
to my commands to get the output that I need?
Upvotes: 0
Views: 333
Reputation: 785128
You may use this gnu awk
:
awk 'NR == 1 { print; next } { ++freq[$0] } END {
PROCINFO["sorted_in"] = "@ind_str_asc"; for (i in freq) print freq[i], i }' file
Id: revision_data
2 0
1 10.1007/S00253-012-4050-Z
1 10.1007/S00775-006-0142-5
1 10.1021/ACS.BIOCHEM.5B00958
1 10.1021/BI020286F
1 10.1038/35422
1 10.1093/NAR/28.8.1743
1 10.1093/NAR/GKN245
1 10.7554/ELIFE.00813
Upvotes: 2
Reputation: 17356
You could concatenate the first line of the file with your command applied to the other lines.
For example:
cat <(head -n1 filename) <(sort <(tail -n+2 filename) | uniq -c)
This applies a sort | uniq -c
to all lines starting with the second (via the tail -n+2
). That's concatenated to the first line (via head -n1
) using cat
.
Upvotes: 1
Reputation: 140990
So just save the first line, output it and then continue with your script.
{
IFS= read -r firstline
printf "%s\n" "$firstline"
sort | uniq -c
} < input_file
Upvotes: 1