rororo
rororo

Reputation: 845

AWK: search substring in first file against second

I have the following files:

data.txt

Estring|0006|this_is_some_random_text|more_text
Fstring|0010|random_combination_of_characters
Fstring|0028|again_here

allids.txt (here the columns are separated by semicolon; the real input is tab-delimited)

Estring|0006;MAR0593
Fstring|0002;MAR0592
Fstring|0028;MAR1195

please note: data.txt: the important part is here the first two "columns" = name|number)

Now I want to use awk to search the first part (name|number) of data.txt in allids.txt and output the second column (starting with MAR)

so my expected output would be (again tab-delimited):

Estring|0006|this_is_some_random_text|more_text;MAR0593
Fstring|0010|random_combination_of_characters
Fstring|0028|again_here;MAR1195

I do not know now how to search that first conserved part within awk, the rest should then be:

awk 'BEGIN{FS=OFS="\t"} FNR == NR { a[$1] = $1; next } $1 in a { print a[$0], [$1] }' data.txt allids.txt 

Upvotes: 0

Views: 90

Answers (2)

anubhava
anubhava

Reputation: 785196

Here is another awk that uses a different field separator for both files:

awk -F ';' 'NR==FNR{a[$1]=FS $2; next} {k=$1 FS $2} 
    k in a{$0=$0 a[k]} 1' allids.txt FS='|' data.txt

Estring|0006|this_is_some_random_text|more_text;MAR0593
Fstring|0010|random_combination_of_characters
Fstring|0028|again_here;MAR1195

This command uses ; as FS for allids.txt and uses | as FS for data.txt.

Upvotes: 2

hek2mgl
hek2mgl

Reputation: 158010

I would use a set of field delimiters, like this:

awk -F'[|\t;]' 'NR==FNR{a[$1"|"$2]=$0; next}
                $1"|"$2 in a {print a[$1"|"$2]"\t"$NF}' data.txt allids.txt

In your real-data example you can remove the ;. It is in here just to be able to reproduce the example in the question.

Upvotes: 2

Related Questions