justaguy
justaguy

Reputation: 3022

awk to add missing to sequential order if id not found in file

The awk below will look for the ids in file1 in $2 of file2 and if they match print the $2. If an id is missing or not found in file2 (like ARRR and AAAA), I can not figure out how to add it to them to the lines in the output as missing in $3 following the same format. That is with the next sequential number in $1, the id from file1 in $2, and the word missing in $3. Thank you :).

awk

awk -F'\t' 'NR==FNR{A[$1];next}$2 in A' file1 file2

file1 space delimited

AARS
AARS2
AARS2;TMEM151B
ARRR
AAAS
AAAA
AADAC

file2 tab-delimited

1   AARS     100.00
2   AARS2    100.00
3   AARS2;TMEM151B   100.00
4   AAAS     100.00
5   AADAC    100.00

desired output tab-delimited

1   AARS     100.00
2   AARS2    100.00
3   AARS2;TMEM151B   100.00
4   AAAS     100.00
5   AADAC    100.00
6   ARRR    missing
7   AAAA    missing

Upvotes: 1

Views: 129

Answers (1)

RomanPerekhrest
RomanPerekhrest

Reputation: 92854

awk solution:

awk 'NR==FNR{ a[$0]; next }$2 in a{ delete a[$2] }
     END{ for(i in a) print ++FNR,i,"missing" }1' file1 OFS='\t' file2

The output:

1   AARS     100.00
2   AARS2    100.00
3   AARS2;TMEM151B   100.00
4   AAAS     100.00
5   AADAC    100.00
6   AAAA    missing
7   ARRR    missing

Upvotes: 2

Related Questions