Reputation: 344
I have a file named c01_1_664_blastx.blast
. And it has contents like below
c1_g1_i1 sp|P87048|RPN1_SCHPO 63.83 47 16 1 20 160 840 885 5e-09 57.9
c3_g1_i1 sp|Q6DCX2|AGO2_XENLA 58.02 131 45 3 567 199 733 861 3e-61 149
..................
...................
...................
I need to count the number of unique values in the first column(c1_g1_i1, c3_g1_i1) by a shell command. Can any one help me with this? Thanks in advance..Cheers
Upvotes: 0
Views: 63
Reputation: 785058
You don't need multiple piped command for this. A single awk
can handle this:
awk '!seen[$1]++{} END{print length(seen)}' file
2
Or:
awk '!seen[$1]++{i++} END{print i}' file
This awk command maintains an associative array seen
to hold only unique values and in the END
section it just prints length of array.
Upvotes: 1
Reputation: 16737
cut -d' ' -f 1 input_file | sort | uniq | wc -l
You can use cut
to specify the delimiter and the fields you want to extract. After extract the first field, you sort
it and then apply uniq
to get the unique entries and count the number of such entries by just piping it to wc
. Note that you could use uniq -c
to get a count of the number of times each unique entry appears.
Upvotes: 2