Reputation: 2350
I have a csv file with thousands of lines in it. I'd like to be able to find values that only appear once in this file.
For instance
dog
dog
cat
dog
bird
I'd like to get as my result:
cat
bird
I tried using the following awk
command but it returned one of each value in the file:
awk -F"," '{print $1}' test.csv|sort|uniq
Returns:
dog
cat
bird
Thank you for your help!
Upvotes: 1
Views: 88
Reputation: 3451
If Perl is an option, this code is similar to @glenn jackman's:
perl -F, -lane '$c{$F[0]}++; END{for $k (sort keys %c){print $k if $c{$k} == 1}}' test.csv
These command-line options are used:
-n
loop around each line of the input file-l
removes newlines before processing, and adds them back in afterwards -a
autosplit mode – split input lines into the @F
array. Defaults to splitting on whitespace. -e
execute the perl code -F
autosplit modifier, in this case splits on ,
@F
is the array of words in each line, indexed starting with $F[0]
Upvotes: 0
Reputation: 52441
Cutting to first field, then sorting and displaying only uniques:
cut -d ',' -f 1 test.csv | sort | uniq -u
That is, if you append -u
to your command, it'd work. This is just using cut
instead of awk.
Upvotes: 0
Reputation: 247042
Just with awk:
awk -F, '{count[$1]++} END {for (key in count) if (count[key] == 1) print key}' test.csv
Upvotes: 3
Reputation: 19665
Close. Try:
awk -F"," '{print $1}' test.csv |sort | uniq -c | awk '{if ($1 == 1) print $2}'
the -c
flag on uniq will give you counts. Next awk will look for any items with the count of 1 (first field) and print the value of the second field ($2)
Only caveat is that this will return bird before cat due to it being previously sroted. you could pipe once more to sort -r
to reverse the sort direction. This would be identical to the expected answer you asked for, but it is not the original sort order.
Upvotes: 1