Mathieu Vigouroux
Mathieu Vigouroux

Reputation: 15

replace Multiple patterns within a list_file with One in target_file with awk

I’m facing a problem

1) I got a list_file intended to be used for inlace replacement like this

Replacement pattern ; Matching patterns

EXTRACT ___________________
toto ; tutu | tata | tonton  | titi 
bobo ; bibi | baba | bubu | bebe 
etc. 14000 lines !!!
_____________________________

2) I got a target file in witch I want to replace thoses paterns

EXTRACT INPUT _______________
hello my name is bob and I am a Titi and I like bubu
_____________________________

I want it to become

EXTRACT OUTPUT ______________
hello my name is bob and I am a toto and I like bobo
_____________________________

for example with one replacement :

echo 'toto; tutu | tata | tonton | titi ' | awk '{gsub(/ tutu | tata | tonton | titi /," toto ")}1'
gives
toto; toto | toto | toto | toto

with

awk -F';' 'NR==FNR{A[$1]=$2; next} IGNORECASE = 1 {for(i in A) gsub(/A[i]/,i)}1’

I expect to :

  1. register an array A with $2 as content and $1 as key so in the fist line $2 =' tutu | tata | tonton | titi ' $1 = ' toto '
  2. replace with gsub(/$2/,$1)}1 so in the fist line awk 'IGNORECASE = 1 {gsub(/ tutu | tata | tonton | titi /," toto ")}1

Sadly awk doesn’t seems to understand the pipe « | » character as a OR indicator … I have also tried to achieve this with sed but this option goes very slowly aven if it works :(

does anyone have a better idea ? Thanks M 

Upvotes: 0

Views: 729

Answers (1)

Ed Morton
Ed Morton

Reputation: 203169

By putting the array reference inside regexp delimiters you're turning A[i] into literal characters in the regexp instead of an array that contains a regexp indexed by a string. Just don't do that. Also your placement of setting IGNORECASE makes no sense. Try this:

awk -F';' 'BEGIN{IGNORECASE = 1} NR==FNR{A[$1]=$2; next} {for(i in A) gsub(A[i],i)}1'

I'm not saying it's a good idea but it might give you the output you're looking for. Stop using the word "pattern" btw as patterns are for quilts and sweaters - in text matching and replacing use either regexp or string, whichever one you mean in every context. You'll find it much easier to write and understand code if you understand where regexps vs strings occur.

Upvotes: 1

Related Questions