Lucia
Lucia

Reputation: 911

Comparing fields of two files in awk

I want to compare two fields of two files, such as follows:

Compare the 2nd filed of file one with the 1st field of file two, print the match (even if the match is repeated) and all the columns of file one and two.

File 1:

 G4   b45  3  4
 G4   b45  1  3
 G3   b23  2  2
 G3   b22  2  6
 G3   b22  2  4

File 2:

 b45  a  b  c
 b64  d  e  f  
 b23  g  h  i
 b22  j  k  l
 b20  m  n  o     

Output:

 G4   b45  a  b  c  3  4
 G4   b45  a  b  c  1  3
 G3   b23  g  h  i  2  2
 G3   b22  j  k  l  2  6
 G3   b22  j  k  l  2  4

I have tried this with the following awk command using associative arrays:

awk 'FNR==NR {array1[$2] = $1 ; arrayrest[$2] = substr($0, index($0, $2)); next}($1 in array1) {print array1[$1] "\t" $0 "\t" arrayrest[$1]}' file1 file2

But there are two problems:

  1. It does not print the lines if the match is repeated while I want them to be printed.
  2. It repeats the first field of file two in the output.

How could I make this awk command work nicely? Thanks in advance.

Upvotes: 1

Views: 2437

Answers (3)

user3442743
user3442743

Reputation:

small awk

awk '{x[$1]=$0}$2=x[$2]' f2 f1

If $1 and $2 can contain the same value

 awk '{x[$1]=$0}FNR!=NR&&$2=x[$2]' f2 f1

output

G4  b45  a  b  c 3 4
G4  b45  a  b  c 1 3
G3  b23  g  h  i 2 2
G3  b22  j  k  l 2 6
G3  b22  j  k  l 2 4

Upvotes: 0

Hackaholic
Hackaholic

Reputation: 19733

This should be simple and clear to u:

awk 'NR==FNR {n[$2]=$0} {if ($1 in n) print n[$1],$2,$3,$4}' file1 file2

Upvotes: 1

Etan Reisner
Etan Reisner

Reputation: 80921

Not quite the exact output formatting you want but the right output contents.

awk 'FNR==NR{seen[$1]=$0; next} ($2 in seen) {$2=seen[$2]}7' file2 file1

Add | column -t to get more consistent column spacing.

Upvotes: 2

Related Questions