Reputation: 77085
Say if I have two files -
1|abc
2|cde
3|pkr
1|abc
2|cde
4|lkg
How can I list true difference in both files using awk
? If the second file is a subset of first file, I can do the following -
awk -F"|" 'NR==FNR{a[$1]=$2;next} !($1 in a)' file{1,2}
But this would give me
4|lkg
I would like to get output as follows since that is the true difference. The difference should be seen as:
3|pkr
4|lkg
Criteria for difference:
Some background:
File 1 and File 2 are a table export from different databases. It has two fields separated by a pipe delimiter. Field 1 is always unique. Field 2 could be same.
My intention is to run awk
one liner on it to find true differences. If I run the command stated above twice (passing file 1 first for first run and file 2 first for second run) I get my records which are missing in both. However, I want to do this in single pass.
Upvotes: 0
Views: 5499
Reputation: 85775
This is what comm
does:
$ comm -3 <(sort file1) <(sort file2)
If say a|1
is in file1
once and in file2
twice then a|1
will appear once in the output as only one of the occurrences in file2
was matched in file1
. If you don't want this behavior and that because a|1
is seen at least once in each it shouldn't be seen in the output then use the -u
option with sort
$ comm -3 <(sort -u file1) <(sort -u file2)
Upvotes: 3
Reputation: 67211
diff file1 file2 | perl -lne 'if(/^[<>]/){s/^..//g;print}'
below is the test:
> cat file1
a|1
b|2
c|1
> cat file2
b|2
c|1
d|0
> diff file1 file2 | perl -lne 'if(/^[<>]/){s/^..//g;print}'
a|1
d|0
>
Upvotes: 1
Reputation: 20688
If you really want to use awk:
$ cat f1
a|1
b|2
c|1
$ cat f2
b|2
c|1
d|0
$ awk '{ h[$0] = ! h[$0] } END { for (k in h) if (h[k]) print k }' f1 f2
a|1
d|0
$
Upvotes: 4