Reputation: 1998
How can I replace multiple strings in one big file ( + 500K lines ) using a mapping file (+ 50K lines) ? The mapping file is structured like this :
A1 B1
A2 B2
A3 B3
.. ..
and the big file is structured like this :
A1 A2
A1 A3
A1 A8
A2 A1
A2 A3
A3 A10
A3 A13
and every string in the big file has to be replace using the mapping file.
Result wanted :
B1 B2
B1 B3
B1 B8
B2 B1
B2 B3
B3 B10
B3 B13
I tried using awk on every line of the mapping file but it takes a very very long time ... Here is the awk command. So I wrote a loop launching for each line of the mapping file an awk command, I save the results in a temporary file and use this result in a new awk with the next line of the mapping file ( not very efficient I know .. )
cat inputBigFile.txt | awk '{ gsub( "A1","B1" );}1' > out.txt
Thanks in advance
Upvotes: 3
Views: 2807
Reputation: 45576
$ awk 'NR==FNR{map[$1]=$2;next} {if($1 in map)$1=map[$1]; if($2 in map)$2=map[$2]}1' mappings file
B1
B1
B1 A8
B2
B2
B3 A10
B3 A13
I assume specifically checking and replacing the two columns to be faster than a loop over NF
and/or using gsub
.
EDIT: It significantly is:
$ wc -l file
8388608 file
.
$ time awk 'NR==FNR{map[$1]=$2;next} {if($1 in map)$1=map[$1]; if ($2 in map)$2=map[$2]}1' mappings file >/dev/null
real 0m6.941s
user 0m6.904s
sys 0m0.016s
.
$ time awk 'NR==FNR{map[$1]=$2;next} {for(i=1;i<=NF;i++)$i=($i in map)?map[$i]:$i}1' mappings file >/dev/null
real 0m10.311s
user 0m10.249s
sys 0m0.036s
.
$ awk --version | head -n 1
GNU Awk 3.1.8
Upvotes: 5