Rodolfo Aramayo
Rodolfo Aramayo

Reputation: 113

How can I merge two files while printing a given value on resulting empty fields using AWK?

I have two files:

01File:

1   2051
2   1244
7   917
X   850
22  444
21  233
Y   47
KI270728_1  6
KI270727_1  4
KI270734_1  3
KI270726_1  2
KI270713_1  2
GL000195_1  2
GL000194_1  2
KI270731_1  1
KI270721_1  1
KI270711_1  1
GL000219_1  1
GL000218_1  1
GL000213_1  1
GL000205_2  1
GL000009_2  1

and 02File:

1   248956422
2   242193529
7   159345973
X   156040895
Y   56887902
22  50818468
21  46709983
KI270728_1  1872759
KI270727_1  448248
KI270726_1  43739
GL000009_2  201709
KI270322_1  21476
GL000226_1  15008
KI270311_1  12399
KI270366_1  8320
KI270511_1  8127
KI270448_1  7992

I need to merge these two files based on Field 01 and print "0"s on resulting empty fields.

I was trying to accomplish this using the following command:

 awk 'FNR==NR{a[$1]=$2 FS $3;next}{ print $0 "\t" a[$1]}' 01File 02File

Which results in the following output:

1   248956422   2051 
2   242193529   1244 
7   159345973   917 
X   156040895   850 
Y   56887902    47 
22  50818468    444 
21  46709983    233 
KI270728_1  1872759 6 
KI270727_1  448248  4 
KI270726_1  43739   2 
GL000009_2  201709  1 
KI270322_1  21476   
GL000226_1  15008   
KI270311_1  12399   
KI270366_1  8320    
KI270511_1  8127    
KI270448_1  7992

However, I am having trouble adapting the command so as to be able to print, in this case a value of zero "0" on the resulting empty fields, so as to generate the following output:

1   248956422   2051 
2   242193529   1244 
7   159345973   917 
X   156040895   850 
Y   56887902    47 
22  50818468    444 
21  46709983    233 
KI270728_1  1872759 6 
KI270727_1  448248  4 
KI270726_1  43739   2 
GL000009_2  201709  1 
KI270322_1  21476   0
GL000226_1  15008   0
KI270311_1  12399   0
KI270366_1  8320    0
KI270511_1  8127    0
KI270448_1  7992    0

I would be grateful if you can get me going in the right direction

Upvotes: 3

Views: 69

Answers (1)

thanasisp
thanasisp

Reputation: 5975

Use a conditional expression in place of a[1]. Instead of the empty string, "0" will be printed if no line matched.

awk 'FNR==NR{a[$1]=$2;next} {print $0 "\t" ($1 in a? a[$1]: "0")}' 01File 02File

Also I simplified the first action, as there are only 2 fields.

Output:

1   248956422   2051
2   242193529   1244
7   159345973   917
X   156040895   850
Y   56887902    47
22  50818468    444
21  46709983    233
KI270728_1  1872759 6
KI270727_1  448248  4
KI270726_1  43739   2
GL000009_2  201709  1
KI270322_1  21476   0
GL000226_1  15008   0
KI270311_1  12399   0
KI270366_1  8320    0
KI270511_1  8127    0
KI270448_1  7992    0

Upvotes: 2

Related Questions