Reputation: 1145

Create diff between two files based on specific column

I have the following problem.

Say I have 2 files:

A.txt

1    A1
2    A2

B.txt

1    B1
2    B2
3    B3

I want to make diff which is based only on values of first column, so the result should be

3     B3

How this problem can be solved with bash in linux?

Upvotes: 0

Answers (3)

Reputation: 21965

[ awk ] is your friend

awk 'NR==FNR{f[$1];next}{if($1 in f){next}else{print}}' A.txt B.txt

or more simply

awk 'NR==FNR{f[$1];next}!($1 in f){print}' A.txt B.txt

or even more simply

awk 'NR==FNR{f[$1];next}!($1 in f)' A.txt B.txt

A bit of explanation will certainly help

NR & FNR are awk built-in variables which stand for total number of records - including current - processed so far and total number of records - including current - processed so far in the current file respectively and they will be equal only for the first file processed.
f[$1] creates the array f at first and then adds $1 as a key if the same key doesn't yet exist. If no value is assigned, then f[$1] is auto-initialized to zero, but this aspect doesn't find a use in your case
next goes to the next record with out processing rest of the awk script.
Note that {if($1 in f){next}else{print}} part will be processed only for the second (and subsequent if any) file/s.
$1 in f checks if the the key $1 exists in the array f
The if-else-print part is self explanatory.
Note in the third version, the {print} is omitted coz the default action for awk is printing !!

Upvotes: 4

Reputation: 18391

awk 'NR==FNR{array[$1];next} !($1 in array)' a.txt b.txt
3    B3

Upvotes: 2

Reputation: 207688

Like this in bash but only if you are really not interested in the second column at all:

diff <(cut -f1 -d" " A.txt) <(cut -f1 -d" " B.txt)

Upvotes: 0