Reputation: 1928
I have a couple of csv's that i need to merge. I want to consider entries which have the same first and second columns as duplicates. I know the command for this is something like
sort -t"," -u -k 1,1 -k 2,2 file1 file2
I additionally want to resolve the duplicates in such a way that the entry from the second file is chosen everytime. What is the way to do that?
Upvotes: 1
Views: 97
Reputation: 134125
If the suggestion to reverse the order of files to the sort
command doesn't work (see other answer), another way to do this would be to concatenate the files, file2
first, and then sort them with the -s
switch.
cat file2 file1 | sort -t"," -u -k 1,1 -k 2,2 -s
-s
forces a stable sort, meaning that identical lines will appear in the same relative order. Since the input to sort
has all of the lines from file2
before file1
, all of the duplicates in the output should come from file2
.
The sort man page doesn't explicitly state that input files will be read in the order that they're supplied on the command line, so I guess it's possible that an implementation could read the files in reverse order, or alternating lines, or whatever. But if you concatenate the files first then there's no ambiguity.
Upvotes: 1
Reputation: 5533
Change the order of the two files and add -s
(@Jim Mischel give the hit) would solve your problem.
sort -t"," -u -k 1,1 -k 2,2 -s file2 file1
man sort
-u, --unique
with -c, check for strict ordering; without -c, output only the
first of an equal run
-s, --stable
stabilize sort by disabling last-resort comparison
Short answer
awk -F"," '{out[$1$2]=$0} END {for(i in out) {print out[i]}}' file1 file2
A bit long answer:
awk 'BEGIN {
FS=OFS=","; # set ',' as field separator
}
{
out[$1$2]=$0; # save the value to dict, new value would replace old value.
}
END {
for (i in out) { # in the end, print all value of the dict
print out[i];
}
}' file1 file2
Upvotes: 1