Reputation: 42
I have a csv file having more than 10000 lines in two column .I have to delete the duplicate entries from column 1. Sample input
col1,col3
od1,pd1
od1,pd4
od2,pd1
od2,pd2
od3,pd6
od3,pd688
od3,pg7
Sample Output
col1,col3
od1,pd1
,pd4
od2,pd1
,pd2
od3,pd6
,pd688
,pg7
Upvotes: 0
Views: 86
Reputation: 16997
If your file is sorted (by 1st field) then
$ cat f
col1,col3
od1,pd1
od1,pd4
od2,pd1
od2,pd2
od3,pd6
od3,pd688
od3,pg7
# Either
$ awk 'BEGIN{FS=OFS=","}$1!=p{p=$1;print;next}{$1=""}1' f
col1,col3
od1,pd1
,pd4
od2,pd1
,pd2
od3,pd6
,pd688
,pg7
# Or
$ awk 'BEGIN{FS=OFS=","}$1==p{$1="";print;next}{p=$1}1' f
col1,col3
od1,pd1
,pd4
od2,pd1
,pd2
od3,pd6
,pd688
,pg7
Upvotes: 0
Reputation: 9203
You can do this easily with awk
The command would be
awk -F"," '{if(!a[$1]++) print $0;}' file.csv
Set the delimeter according to your csv format with the -F flag.
Upvotes: 0
Reputation: 784968
awk
can handle this easily using an associative array with key as col1
:
awk 'BEGIN{FS=OFS=","} seen[$1]++{$1=""} 1' file
col1,col3
od1,pd1
,pd4
od2,pd1
,pd2
od3,pd6
,pd688
,pg7
Upvotes: 1