Perl Multiple search and replace actions in one large text file

Question

Given a set of replacement strings in a file replacements.txt like

s/string1/replacement1/g;
s/string2/replacement2/g;
s/string3/replacement3/g;
s/string4/replacement4/g;
s/string5/replacement5/g;

I would like to obtain the equivalent of

sed -f replacements.txt infile.txt

~~my file is so big that sed cant handle it, while I know that perl could do the trick.~~

Also the replacements are really a lot, and change from time to time. ( I need to run this a dozen of times)

Note that the replacements are fixed strings, so I do not really need those to be regular expressions.

sed has problems only when the regexp has globs and the input file is a single large line.

mklement0 · Accepted Answer

The perl equivalent of your sed command is:

perl -p replacements.txt infile.txt

It should work with your sample replacements.txt, given that the s statements are properly ;-terminated (note that sed would recognize the end of the line by itself as the statement terminator).

The real problem, however, is that the entire large file is a single line, so the key to avoiding running out of memory is to:

temporarily break that line into many short lines,
send these short lines through the pipeline and perform the string replacements on them,
and then re-join the modified short lines to form a single line again.

If there's a character in the data that delimits records (units of data), in a away that doesn't interfere with the string replacements, breaking the long line into multiple ones with the help of tr is a viable approach; I'll use } as an example, because Kuzeko states that the data is JSON-like:

If you have GNU sed (Linux; verify with sed --version):

tr '}' '\0' < infile.txt | sed -z -f replacements.txt | tr '\0' '}'

Having tr output NUL-separated "lines" (\0) and sed read them accordingly (-z) is the most robust way to handle the chunking.
Unfortunately, the -z / --null-data option is not POSIX-compliant and the BSD/macOS implementation does not support it.

Otherwise (e.g., on macOS):

tr '}' '
' < infile.txt | perl -p replacements.txt infile.txt | tr '
' '}'

Caveat: If the single line in infile.txt has a trailing , you'll end up with an extra } char. at the end; to prevent that, add an initial tr stage to the pipeline that deletes the :
tr -d ' ' < infile.txt | tr '}' ' ' | ...

perl is still needed, because - unlike BSD/macOS sed - it preserves the trailing--or-not status of the input's last line.

Perl Multiple search and replace actions in one large text file

Answers (1)

Related Questions