Replace values in fileA if values are in fileB using awk / sed

Question

Here are 2 files, and I want to replace values in fileA, from the values in fileB (if present).

The idea is to process fileA line by line and check if the "gene_id" value (column #3) is somewhere in the column #1 of fileB.

In the first line of fileA, the value is found in fileB. So we replace the value in fileA "id1.2" (column #3) by value in fileB "ND1" (column #3). In the second line of fileA, the value is not found in fileB. So it doesn't do anything.

The difficulty is also that it's not the exact same pattern between fileA and fileB, but the whole part before the ".2" has to be the same (e.g. id1 in fileB VS "id1.2" in fileA).

Original files:

> cat fileA.txt
chr1    gene_id "id1.2";
chr1    gene_id "id2.2";

> cat fileB.txt
id1 protein_coding  ND1 MT

Wanted files (extract value in column #3 from fileB and if there's a match, put it in column #3 of fileA) :

> cat fileA.txt
chr1    gene_id "ND1";
chr1    gene_id "id2.2";

I tried something inspired from this post, but it's not working (I'm not sure I really understood the meaning of this awk line as it's the first time I'm using this syntax):

awk -F ' ' 'NR==FNR{a[$1]=$3;next}{$3=a[$3];}1' fileB.txt fileA.txt

Any help would be more than welcome.

RavinderSingh13 · Accepted Answer

Could you please try following, based on your samples only(change column numbers accordingly as per your real Input_files).

awk -v s1="\"" '
FNR==NR{
   a[$1]=$3
   next
}
{
   val=$3
   gsub(/\"|;|\..*/,"",val)
}
(val in a){
   $3=s1 a[val] s1";"
}
1
'  fileb filea | 
   column -t

Replace values in fileA if values are in fileB using awk / sed

Answers (2)

Related Questions