Lilly Tooner
Lilly Tooner

Reputation: 95

Counting unique values in a column with a shell script

I have a tab delimited file with 5 columns and need to retrieve a count of just the number of unique lines from column 2. I would normally do this with Perl/Python but I am forced to use the shell for this one.

I have successfully in the past used *nix uniq function piped to wc but it looks like I am going to have to use awk in here.

Any advice would be greatly appreciated. (I have asked a similar question previously about column checks using awk but this is a little different and I wanted to separate it so if someone in the future has this question this will be here)

Many many thanks!
Lilly

Upvotes: 7

Views: 29137

Answers (3)

martin clayton
martin clayton

Reputation: 78225

I go for

$ cut -f2 file.txt | sort -u | wc -l

At least in some versions, uniq relies on the input data being sorted (it looks only at adjacent lines).

For example in the Solaris docs:

The uniq utility will read an input file comparing adjacent lines, and write one copy of each input line on the output. The second and succeeding copies of repeated adjacent input lines will not be written.

Repeated lines in the input will not be detected if they are not adjacent.

Upvotes: 5

Vijay
Vijay

Reputation: 67319

awk '{if($0~/Not Running/)a++;else if($0~/Running/)b++}END{print a,b}' temp

Upvotes: 0

unwind
unwind

Reputation: 400159

No need to use awk.

$ cut -f2 file.txt | sort | uniq | wc -l

should do it.

This uses the fact that tab is cut's default field separator, so we'll get just the content from column two this way. Then a pass through sort works as a pre-stage to uniq, which removes the duplicates. Finally we count the lines, which is the sought number.

Upvotes: 22

Related Questions