Reputation: 1427

Awk command, print lines which are occurring only once in a csv file

I have a csv file which may have duplicates. I need help in an awk command which prints only those lines whose occurrence is only once in the file.

Eg: Input file:

a,b
a,c
a,d
a,b
a,c
b,e
b,f
b,d
b,f
b,e

Output:

a,d
b,d

Thank you for your help.

Upvotes: 1

Answers (3)

Naibin Duan

Reputation: 3

3 method to print the uniq only once blast contigs.

awk 'NF>4' valsidate_1k_vs_gdd13|grep Chr|awk '{arr[$1]++}END{for(i in arr)if(arr[i]==1)print i}'  

awk 'NF>4' valsidate_1k_vs_gdd13|grep Chr|cut -f 1|sort| uniq -u

awk 'NF>4' valsidate_1k_vs_gdd13|grep Chr|cut -f 1|sort |uniq -c |grep  '\ 1 Chr'

Upvotes: 0

Akshay Hegde

Reputation: 16997

Using awk:

awk '{arr[$0]++}END{for(i in arr)if(arr[i]==1)print i}' infile

Sort and uniq

$ sort file | uniq -u # -u generates unique entries; -d nonunique
a,d
b,d

Test Results:

$ cat file
a,b
a,c
a,d
a,b
a,c
b,e
b,f
b,d
b,f
b,e

$ awk '{arr[$0]++}END{for(i in arr)if(arr[i]==1)print i}' file
a,d
b,d

Explanation:

arr[$0]++ $0 is current line/record, which is used as array key, arr is array, arr[$0]++ holds a count of occurrence of key, so whenever awk finds duplicate key, count will be incremented by one.
so at the end block, loop through array, if count is equal to one, print such array key.

Upvotes: 2

RomanPerekhrest

Reputation: 92884

The shortest one with uniq command:

uniq -u <(sort file)

-u - only print unique lines

The output:

a,d
b,d

Upvotes: 1

Awk command, print lines which are occurring only once in a csv file

Answers (3)

Related Questions