Nicolas Rosewick
Nicolas Rosewick

Reputation: 1998

Replace multiple strings in file using a mapping file

How can I replace multiple strings in one big file ( + 500K lines ) using a mapping file (+ 50K lines) ? The mapping file is structured like this :

A1  B1
A2  B2
A3  B3
..  ..

and the big file is structured like this :

A1  A2
A1  A3
A1  A8
A2  A1
A2  A3
A3  A10
A3  A13

and every string in the big file has to be replace using the mapping file.

Result wanted :

B1  B2
B1  B3
B1  B8
B2  B1
B2  B3
B3  B10
B3  B13

I tried using awk on every line of the mapping file but it takes a very very long time ... Here is the awk command. So I wrote a loop launching for each line of the mapping file an awk command, I save the results in a temporary file and use this result in a new awk with the next line of the mapping file ( not very efficient I know .. )

cat inputBigFile.txt | awk '{ gsub( "A1","B1" );}1' > out.txt

Thanks in advance

Upvotes: 3

Views: 2807

Answers (1)

Adrian Frühwirth
Adrian Frühwirth

Reputation: 45576

$ awk 'NR==FNR{map[$1]=$2;next} {if($1 in map)$1=map[$1]; if($2 in map)$2=map[$2]}1' mappings file
B1
B1
B1 A8
B2
B2
B3 A10
B3 A13

I assume specifically checking and replacing the two columns to be faster than a loop over NF and/or using gsub.

EDIT: It significantly is:

$ wc -l file
8388608 file

.

$ time awk 'NR==FNR{map[$1]=$2;next} {if($1 in map)$1=map[$1]; if ($2 in map)$2=map[$2]}1' mappings file >/dev/null
real    0m6.941s
user    0m6.904s
sys     0m0.016s

.

$ time awk 'NR==FNR{map[$1]=$2;next} {for(i=1;i<=NF;i++)$i=($i in map)?map[$i]:$i}1' mappings file >/dev/null
real    0m10.311s
user    0m10.249s
sys     0m0.036s

.

$ awk --version | head -n 1
GNU Awk 3.1.8

Upvotes: 5

Related Questions