Reputation: 23
I have a CSV that has dates and names, I want to sort out duplicate dates, but leave all the names by using bash tools. Eg:
Input CSV:
01/01/1960,"Torvalds, Linus"
01/01/1960,"Gates, Bill"
04/20/1960,"Woz, Steve"
Output CSV:
01/01/1960,"Torvalds, Linus"
,"Gates, Bill"
04/20/1960,"Woz, Steve"
I know I could do this using loops and line numbers, but I'm wondering if there's a better way. Perhaps using sort or uniq?
Upvotes: 2
Views: 58
Reputation: 18371
Using sort
and awk
:
sort -t',' -k1 input.csv |awk -F, -v OFS=, '($1 in a) {gsub($1,"") } !($1 in a) {a[$1]=$1}1'
01/01/1960,"Gates, Baleee"
,"Gates, Benny"
,"Gates, Bill"
,"Torvalds, Linus"
04/20/1960,"Wozniac, pitor"
,"Woz, Steve"
When the input.csv is as follow:
-->cat input.csv
01/01/1960,"Torvalds, Linus"
01/01/1960,"Gates, Bill"
01/01/1960,"Gates, Benny"
04/20/1960,"Woz, Steve"
01/01/1960,"Gates, Baleee"
04/20/1960,"Wozniac, pitor"
Explanation: Here, FS
is set to comma and for each line first column is stored in an array named a
if its not already present and if its already present then replace the first column with blank using gsub
function. sort
is used to bring dates in sequence.
Upvotes: 1