Merge two CSVs while resolving duplicates

Question

I have a couple of csv's that i need to merge. I want to consider entries which have the same first and second columns as duplicates. I know the command for this is something like

sort -t"," -u -k 1,1 -k 2,2 file1 file2

I additionally want to resolve the duplicates in such a way that the entry from the second file is chosen everytime. What is the way to do that?

Jim Mischel · Accepted Answer

If the suggestion to reverse the order of files to the sort command doesn't work (see other answer), another way to do this would be to concatenate the files, file2 first, and then sort them with the -s switch.

cat file2 file1 | sort -t"," -u -k 1,1 -k 2,2 -s

-s forces a stable sort, meaning that identical lines will appear in the same relative order. Since the input to sort has all of the lines from file2 before file1, all of the duplicates in the output should come from file2.

The sort man page doesn't explicitly state that input files will be read in the order that they're supplied on the command line, so I guess it's possible that an implementation could read the files in reverse order, or alternating lines, or whatever. But if you concatenate the files first then there's no ambiguity.

Merge two CSVs while resolving duplicates

Answers (2)

Related Questions