Reputation: 15
I have a csv with 17 columns and many 1000s of rows. In column 2, I am attempting to delete duplicates, but keep the first.
File example:
1001,Henry
1002,Dave
1003,Dave
1004,Tom
when I run:
sort -t, -k2,2 -u file.csv -o newfile.csv
the newfile.csv contains (wrong)
1001,Henry
1004,Tom
desired output:
1001,Henry
1002,Dave
1004,Tom
I've tried several things with awk as well, no luck. Thanks in advance!
Upvotes: 0
Views: 95
Reputation: 336
Try this,
awk -F ',' '!seen[$2]++' file.csv > newfile.csv
This command is telling awk which lines to print. The variable $2 holds the entire contents of column 2 and square brackets are array access. So, for each second column of line in filename, the node of the array named seen is incremented and the line printed if the content of that node(column2) was not (!) previously set.
Upvotes: 2