Is there an easy way to remove duplicate fields in a CSV?

Question

I have a CSV that has dates and names, I want to sort out duplicate dates, but leave all the names by using bash tools. Eg:

Input CSV:

01/01/1960,"Torvalds, Linus"
01/01/1960,"Gates, Bill"
04/20/1960,"Woz, Steve"

Output CSV:

01/01/1960,"Torvalds, Linus"
,"Gates, Bill"
04/20/1960,"Woz, Steve"

I know I could do this using loops and line numbers, but I'm wondering if there's a better way. Perhaps using sort or uniq?

P.... · Accepted Answer

Using sort and awk :

sort -t',' -k1 input.csv |awk -F, -v OFS=, '($1 in a) {gsub($1,"") } !($1 in a) {a[$1]=$1}1'
01/01/1960,"Gates, Baleee"
,"Gates, Benny"
,"Gates, Bill"
,"Torvalds, Linus"
04/20/1960,"Wozniac, pitor"
,"Woz, Steve"

When the input.csv is as follow:

 -->cat input.csv
01/01/1960,"Torvalds, Linus"
01/01/1960,"Gates, Bill"
01/01/1960,"Gates, Benny"
04/20/1960,"Woz, Steve"
01/01/1960,"Gates, Baleee"
04/20/1960,"Wozniac, pitor"

Explanation: Here, FS is set to comma and for each line first column is stored in an array named a if its not already present and if its already present then replace the first column with blank using gsub function. sort is used to bring dates in sequence.

Is there an easy way to remove duplicate fields in a CSV?

Answers (1)

Related Questions