Reputation: 1427
I have a csv file which may have duplicates. I need help in an awk command which prints only those lines whose occurrence is only once in the file.
Eg: Input file:
a,b
a,c
a,d
a,b
a,c
b,e
b,f
b,d
b,f
b,e
Output:
a,d
b,d
Thank you for your help.
Upvotes: 1
Views: 917
Reputation: 3
3 method to print the uniq only once blast contigs.
awk 'NF>4' valsidate_1k_vs_gdd13|grep Chr|awk '{arr[$1]++}END{for(i in arr)if(arr[i]==1)print i}'
awk 'NF>4' valsidate_1k_vs_gdd13|grep Chr|cut -f 1|sort| uniq -u
awk 'NF>4' valsidate_1k_vs_gdd13|grep Chr|cut -f 1|sort |uniq -c |grep '\ 1 Chr'
Upvotes: 0
Reputation: 16997
Using awk
:
awk '{arr[$0]++}END{for(i in arr)if(arr[i]==1)print i}' infile
Sort and uniq
$ sort file | uniq -u # -u generates unique entries; -d nonunique
a,d
b,d
Test Results:
$ cat file
a,b
a,c
a,d
a,b
a,c
b,e
b,f
b,d
b,f
b,e
$ awk '{arr[$0]++}END{for(i in arr)if(arr[i]==1)print i}' file
a,d
b,d
Explanation:
arr[$0]++
$0
is current line/record, which is used as array key, arr
is array, arr[$0]++
holds a count of occurrence of key, so whenever awk finds duplicate key, count will be incremented by one.
so at the end block, loop through array, if count is equal to one, print such array key.
Upvotes: 2
Reputation: 92884
The shortest one with uniq
command:
uniq -u <(sort file)
-u
- only print unique linesThe output:
a,d
b,d
Upvotes: 1