monk
monk

Reputation: 2115

join two files based on three columns

I have following two files, I need to join them based on the value of three columns, and if not matched print NA in the unmatched column.

cat f1
AAA  0  node4  Activated  Unreachable  down
AAA  1  node3  Activated  Pingable     cool

cat f2 
AAA  0  node3  XYZ  Active

Currently I am getting incorrect output using:

 awk 'NR==FNR{a[$1]=$1;b[$2]=$2;c[$3]=$3;next} $1 in a && $2 in b &&  $3 in c{print $0}' f1 f2
AAA 0 node3 XYZ Active

Desired output:

AAA 0 node4 Activated Unreachable down NA
AAA 1 node3 Activated Pingable    cool Active

Upvotes: 1

Views: 130

Answers (2)

RomanPerekhrest
RomanPerekhrest

Reputation: 92854

awk approach:

awk 'NR==FNR{a[$1,$3]=$5; next}{$7="NA";if(($1,$3) in a){$7=a[$1,$3]} print}' f2 f1

The output:

AAA 0 node4 Activated Unreachable down NA
AAA 1 node3 Activated Pingable cool Active

a[$1,$3]=$5 - stores the value of the fifth field $5 in the second file f2 using combination of first $1 and third $3 fields as array key

$7="NA"; - initiates an additional seventh field $7 with default value "NA"

Upvotes: 1

Inian
Inian

Reputation: 85550

Use an Awk logic as below,

awk 'FNR==NR{hash[$1FS$3]=$NF; next}{for(i in hash) if (match(i,$1FS$3)) { $(NF+1)=hash[i] } else { $(NF+1)="NA" } }1' f2 f1

which produces output as you needed.

AAA 0 node4 Activated Unreachable down NA
AAA 1 node3 Activated Pingable cool Active

The idea is to parse the second file first to store the status, index by the node value into the array hash. Then on the first file, a loop on all indices and if the value in $3 on f1 matches the hashed value, print the status accordingly and it not found just print NA.

Upvotes: 1

Related Questions