tech_help
tech_help

Reputation: 42

delete the duplicate entries from column in csv file

I have a csv file having more than 10000 lines in two column .I have to delete the duplicate entries from column 1. Sample input

col1,col3
od1,pd1
od1,pd4
od2,pd1
od2,pd2
od3,pd6
od3,pd688
od3,pg7

Sample Output

col1,col3
od1,pd1
,pd4
od2,pd1
,pd2
od3,pd6
,pd688
,pg7

Upvotes: 0

Views: 86

Answers (3)

Akshay Hegde
Akshay Hegde

Reputation: 16997

If your file is sorted (by 1st field) then

$ cat f
col1,col3
od1,pd1
od1,pd4
od2,pd1
od2,pd2
od3,pd6
od3,pd688
od3,pg7

# Either
$ awk  'BEGIN{FS=OFS=","}$1!=p{p=$1;print;next}{$1=""}1' f
col1,col3
od1,pd1
,pd4
od2,pd1
,pd2
od3,pd6
,pd688
,pg7

# Or
$ awk  'BEGIN{FS=OFS=","}$1==p{$1="";print;next}{p=$1}1' f
col1,col3
od1,pd1
,pd4
od2,pd1
,pd2
od3,pd6
,pd688
,pg7

Upvotes: 0

Ajay Brahmakshatriya
Ajay Brahmakshatriya

Reputation: 9203

You can do this easily with awk

The command would be

awk -F"," '{if(!a[$1]++) print $0;}' file.csv

Set the delimeter according to your csv format with the -F flag.

Upvotes: 0

anubhava
anubhava

Reputation: 784968

awk can handle this easily using an associative array with key as col1:

awk 'BEGIN{FS=OFS=","} seen[$1]++{$1=""} 1' file

col1,col3
od1,pd1
,pd4
od2,pd1
,pd2
od3,pd6
,pd688
,pg7

Upvotes: 1

Related Questions