Vecta
Vecta

Reputation: 2350

Find Values in CSV that only Appear Once

I have a csv file with thousands of lines in it. I'd like to be able to find values that only appear once in this file.

For instance

dog
dog
cat
dog
bird

I'd like to get as my result:

cat
bird

I tried using the following awk command but it returned one of each value in the file:

awk -F"," '{print $1}' test.csv|sort|uniq

Returns:

dog
cat
bird

Thank you for your help!

Upvotes: 1

Views: 88

Answers (4)

Chris Koknat
Chris Koknat

Reputation: 3451

If Perl is an option, this code is similar to @glenn jackman's:

perl -F, -lane '$c{$F[0]}++; END{for $k (sort keys %c){print $k if $c{$k} == 1}}' test.csv

These command-line options are used:

  • -n loop around each line of the input file
  • -l removes newlines before processing, and adds them back in afterwards
  • -a autosplit mode – split input lines into the @F array. Defaults to splitting on whitespace.
  • -e execute the perl code
  • -F autosplit modifier, in this case splits on ,

@F is the array of words in each line, indexed starting with $F[0]

Upvotes: 0

Benjamin W.
Benjamin W.

Reputation: 52441

Cutting to first field, then sorting and displaying only uniques:

cut -d ',' -f 1 test.csv | sort | uniq -u

That is, if you append -u to your command, it'd work. This is just using cut instead of awk.

Upvotes: 0

glenn jackman
glenn jackman

Reputation: 247042

Just with awk:

awk -F, '{count[$1]++} END {for (key in count) if (count[key] == 1) print key}' test.csv

Upvotes: 3

matchew
matchew

Reputation: 19665

Close. Try:

awk -F"," '{print $1}' test.csv |sort | uniq -c | awk '{if ($1 == 1) print $2}'

the -c flag on uniq will give you counts. Next awk will look for any items with the count of 1 (first field) and print the value of the second field ($2)

Only caveat is that this will return bird before cat due to it being previously sroted. you could pipe once more to sort -r to reverse the sort direction. This would be identical to the expected answer you asked for, but it is not the original sort order.

Upvotes: 1

Related Questions