Using awk to compare and print output from both files

Question

Is it possible to use awk to compare and return results from both files that match?

I am currently using:

awk 'BEGIN{FS=OFS="	"} NR==FNR{c[$1$2]++;next};c{$1$2}>0' queryfile hitsfile

to match results from query and return outputs in hits, however it only returns the columns from the hits files

I've tried:

awk 'BEGIN{FS=OFS="	"} NR==FNR{c[$1$2]++;next};c{$1$2}>0 {print $1,$2,c[$1]}'

but it doesnt work

My example data looks like this:

queryfile

chr1 1000 1005 BDSD
chr1 1010 1015 SKK1
chr2 1015 1015 AVPR

hitsfile

chr1 1000 1005 0.5
chr1 1001 1002 0.35
chr1 1010 1015 0.4
chr1 1011 1016 0.56
chr2 1015 1015 0.1

I would like my output file to look like the following

*output results*

chr1 1000 1005 0.5 BDSD
chr1 1010 1015 0.4 SKK1
chr2 1015 1015 0.1 AVPR

So basically, the hits that match the query is returned PLUS another column in the query data. Is this possible using awk oneliners?

Also, another question is is it possible given a query RANGE inside the query file, and return all lines that is within the hitsfile compared to exact matches with awk?

Usually I do these in R, but its slow when processing large files and awk is much much faster!

Thank you!

Ed Morton · Accepted Answer

$ awk 'NR==FNR{a[$1,$2]=$4;next} ($1,$2) in a{print $0, a[$1,$2]}' queryfile hitsfile
chr1 1000 1005 0.5 BDSD
chr1 1010 1015 0.4 SKK1
chr2 1015 1015 0.1 AVPR

Using awk to compare and print output from both files

Answers (2)

Related Questions