Reputation: 509
I'm trying to replace substrings in a text file [corpus.txt] with some other substrings using sed. I have the list of possible substrings in a file sub.txt
containing the following:
dogs chase
birds eat
chase birds
chase cat
chase birds .
and a corpus.txt
containing some texts as below:
dogs chase cats around
dogs bark
cats meow
dogs chase birds
cats chase birds , birds eat grains
dogs chase the cats
the birds chirp
with the desired output
<bop> dogs chase <eop> cats around
dogs bark
cats meow
<bop> dogs chase <eop> birds
cats <bop> chase birds <eop> , <bop> birds eat <eop> grains
<bop> dogs chase <eop> the cats
the birds chirp
Using the Command sed -f <(sed 's/.*/s|\\b&\\b|<bop> & <eop>|g/' sub.txt) corpus.txt
it returns everything in the desired output correctly, except in the fifth line where it returns :
cats <bop> <bop> chase birds . <eop>eop> , <bop> birds eat <eop> grains
What can I do to get this to work?
Upvotes: 0
Views: 63
Reputation: 67567
you have to escape the .
in the first file to make a literal match
$ sed -f <(sed 's/\./\\./;s/.*/s|\\b&\\b|<bop> & <eop>|g/' sub_o.txt) file
<bop> dogs chase <eop> cats around
dogs bark
cats meow
<bop> dogs chase <eop> birds
cats <bop> chase birds <eop> , <bop> birds eat <eop> grains
<bop> dogs chase <eop> the cats
the birds chirp
Upvotes: 2