Say if I have two files - File1: 1|abc 2|cde 3|pkr File2: 1|abc 2|cde 4|lkg How can I list true difference in both files using awk ? If the second file is a subset of first file, I can do the following - awk -F"|" 'NR==FNR{a[$1]=$2;next} !($1 in a)' file{1,2} But this would give me 4|lkg I would like to get output as follows since that is the true difference. The difference should be seen as: 3|pkr 4|lkg Criteria for difference: Field 1 present in file1 but not in file2. Field 1 present in file2 but not in file1. Field 1 present in both file but has different values. Some background: File 1 and File 2 are a table export from different databases. It has two fields separated by a pipe delimiter. Field 1 is always unique. Field 2 could be same. My intention is to run awk one liner on it to find true differences. If I run the command stated above twice (passing file 1 first for first run and file 2 first for second run) I get my records which are missing in both. However, I want to do this in single pass.

Reputation: 77185

List differences in two files using awk

Say if I have two files -

File1:

1|abc
2|cde
3|pkr

File2:

1|abc
2|cde
4|lkg

How can I list true difference in both files using awk? If the second file is a subset of first file, I can do the following -

awk -F"|" 'NR==FNR{a[$1]=$2;next} !($1 in a)' file{1,2}

But this would give me

4|lkg

I would like to get output as follows since that is the true difference. The difference should be seen as:

3|pkr
4|lkg

Criteria for difference:

Field 1 present in file1 but not in file2.
Field 1 present in file2 but not in file1.
Field 1 present in both file but has different values.

Some background:

File 1 and File 2 are a table export from different databases. It has two fields separated by a pipe delimiter. Field 1 is always unique. Field 2 could be same.

My intention is to run awk one liner on it to find true differences. If I run the command stated above twice (passing file 1 first for first run and file 2 first for second run) I get my records which are missing in both. However, I want to do this in single pass.

Upvotes: 0

Answers (3)

Chris Seymour

Reputation: 85883

This is what comm does:

$ comm -3 <(sort file1) <(sort file2)

If say a|1 is in file1 once and in file2 twice then a|1 will appear once in the output as only one of the occurrences in file2 was matched in file1. If you don't want this behavior and that because a|1 is seen at least once in each it shouldn't be seen in the output then use the -u option with sort

$ comm -3 <(sort -u file1) <(sort -u file2)

Upvotes: 3

Vijay

Reputation: 67319

diff file1 file2 | perl -lne 'if(/^[<>]/){s/^..//g;print}'

below is the test:

> cat file1
a|1
b|2
c|1
> cat file2
b|2
c|1
d|0
> diff file1 file2 | perl -lne 'if(/^[<>]/){s/^..//g;print}'
a|1
d|0
>

Upvotes: 1

pynexj

Reputation: 20797

If you really want to use awk:

$ cat f1
a|1
b|2
c|1
$ cat f2
b|2
c|1
d|0
$ awk '{ h[$0] = ! h[$0] } END { for (k in h) if (h[k]) print k }' f1 f2
a|1
d|0
$

Upvotes: 4

List differences in two files using awk

File1:

File2:

Answers (3)

Related Questions