eskp
eskp

Reputation: 193

Compare first column of one file with the first column of second and print associated column of each if there was a match

I have two files, I need to compare their first columns and if the match is found, I'd like to output the corresponding values from both files.

Similar to this Q but I'd like to print columns from both files not one: How to compare multiple columns in two files and retrieve the corresponding value from another column if match found

File1.txt

adeqY   33.7
AIsLX   65.6
AmuBv   1589.0
aZMIx   84.4

File2.txt

AmuBv foo
iwwlp bar
adeqY hi
qUbJZ bye

Output

hi 33.7
foo 1589.0

I have the following awk command but I only managed to print the second column match from File2:

awk 'FNR==NR{a[$1]; next} ($1) in a {print $2 a[$2]}' File1.txt File2.txt

a[$2] doesn't want to print

Thanks in advance.

Upvotes: 4

Views: 3009

Answers (4)

stack0114106
stack0114106

Reputation: 8711

Try Perl variant

$ cat eskp1.txt
adeqY   33.7
AIsLX   65.6
AmuBv   1589.0
aZMIx   84.4

$ cat eskp2.txt
AmuBv foo
iwwlp bar
adeqY hi
qUbJZ bye

$ perl -F"\s+" -lane 'BEGIN { %kv=map{split /\s+/ } qx(cat eskp1.txt) } { print "$F[1] $kv{$F[0]}" if $kv{$F[0]} } ' eskp2.txt
foo 1589.0
hi 33.7

Upvotes: 0

Alex Harvey
Alex Harvey

Reputation: 15472

What you are trying to do there is essentially an INNER JOIN on two tables stored in text files and the Linux join command is designed for just that.

Try:

join -t' ' -1 1 -2 1 -o 2.2,1.2 <(sort file1.txt) <(sort file2.txt) 
foo 1589.0
hi 33.7

Explanation:

  • The field separator is specified as -t. I assumed that your data is separated by a single space in these text files.
  • The -1 1 -2 1 is says to join on the first field for the left-hand side file and the first field on the right-hand side file.
  • The -o 2.2,1.2 defines the fields you want to return. The second field from the second file then the second field from the first file.
  • Finally, note that I sorted the input files as the files need to be sorted by the field that you intend to join by for the Linux join to work.

See also this blog (ref).

Upvotes: 1

mjuarez
mjuarez

Reputation: 16824

What's happening here is that, once you're printing results in the last awk statement, the a[] array is no longer in scope, hence why that second value is not printing.

Maybe there is another way to do this awk, but this is the solution I came up with:

for each in $(comm -1 -2 <(awk '{print $1 }' file1.txt | sort ) <(awk '{print $1 }' file2.txt | sort) ); do echo $(grep $each file2.txt | awk '{print $2}') $(grep $each file1.txt | awk '{print $2}') ; done;

This outputs:

foo 1589.0
hi 33.7

Explanation:

  • Run a comm command on the two files.
  • The two "files" given to comm are actually process substituted, so that they are sorted first (comm expects sorted input) and only the first column is shown.
  • The -1 -2 parameters to comm tells it to only print the common elements from the files (it can show elements unique to first file, unique to second file, or common to both)
  • Once you have the common elements from both files, for each of those common elements, go ahead and grep each of the files for the line where it appears, and show only the second value, using awk.

So, in the end, we needed a bash for loop, sort, comm and awk multiple times. Not the most elegant solution probably, but it gets the job done.

Upvotes: 0

RavinderSingh13
RavinderSingh13

Reputation: 133428

Could you please try following.

awk 'FNR==NR{a[$1]=$2;next} ($1 in a){print $2,a[$1]}' Input_file1  Input_file2

Output will be as follows.

foo 1589.0
hi 33.7

Problem in your attempt: You was going good only thing in FNR==NR condition your a[$1] is NOT having any value it only created its index in array a so that is why it was not able to print anything when 2nd Input_file is being read.

Upvotes: 2

Related Questions