Reputation: 344

Count unique values from a column of a random file by shell command

I have a file named c01_1_664_blastx.blast. And it has contents like below

c1_g1_i1    sp|P87048|RPN1_SCHPO    63.83   47      16      1       20      160     840     885     5e-09   57.9
c3_g1_i1    sp|Q6DCX2|AGO2_XENLA    58.02   131     45      3       567     199     733     861     3e-61     149
..................
...................
...................

I need to count the number of unique values in the first column(c1_g1_i1, c3_g1_i1) by a shell command. Can any one help me with this? Thanks in advance..Cheers

Upvotes: 0

Answers (2)

anubhava

Reputation: 785058

You don't need multiple piped command for this. A single awk can handle this:

awk '!seen[$1]++{} END{print length(seen)}' file
2

Or:

awk '!seen[$1]++{i++} END{print i}' file

This awk command maintains an associative array seen to hold only unique values and in the END section it just prints length of array.

Upvotes: 1

Pradhan

Reputation: 16737

cut -d' ' -f 1  input_file | sort | uniq | wc -l

You can use cut to specify the delimiter and the fields you want to extract. After extract the first field, you sort it and then apply uniq to get the unique entries and count the number of such entries by just piping it to wc. Note that you could use uniq -c to get a count of the number of times each unique entry appears.

Upvotes: 2

Count unique values from a column of a random file by shell command

Answers (2)

Related Questions