Replace Substring in a text file with a text file of strings using Sed

I'm trying to replace substrings in a text file [corpus.txt] with some other substrings using sed. I have the list of possible substrings in a file sub.txt containing the following:

dogs chase
birds eat
chase birds
chase cat
chase birds .

and a corpus.txt containing some texts as below:

dogs chase cats around
dogs bark
cats meow
dogs chase birds
cats chase birds , birds eat grains
dogs chase the cats
the birds chirp

with the desired output

<bop> dogs chase <eop> cats around
dogs bark
cats meow
<bop> dogs chase <eop> birds 
cats <bop> chase birds <eop> , <bop> birds eat <eop> grains
<bop> dogs chase <eop> the cats
the birds chirp

Using the Command sed -f <(sed 's/.*/s|\\b&\\b|<bop> & <eop>|g/' sub.txt) corpus.txt it returns everything in the desired output correctly, except in the fifth line where it returns :

cats <bop> <bop> chase birds . <eop>eop> , <bop> birds eat <eop> grains

What can I do to get this to work?

Upvotes: 0

Views: 63

Answers (1)

karakfa
karakfa

Reputation: 67567

you have to escape the . in the first file to make a literal match

$ sed -f <(sed 's/\./\\./;s/.*/s|\\b&\\b|<bop> & <eop>|g/' sub_o.txt) file

<bop> dogs chase <eop> cats around
dogs bark
cats meow
<bop> dogs chase <eop> birds
cats <bop> chase birds <eop> , <bop> birds eat <eop> grains
<bop> dogs chase <eop> the cats
the birds chirp

Upvotes: 2

Related Questions