Reputation: 1705
Given a set of replacement strings in a file replacements.txt
like
s/string1/replacement1/g;
s/string2/replacement2/g;
s/string3/replacement3/g;
s/string4/replacement4/g;
s/string5/replacement5/g;
I would like to obtain the equivalent of
sed -f replacements.txt infile.txt
my file is so big that sed
cant handle it, while I know that perl could do the trick.
Also the replacements are really a lot, and change from time to time. ( I need to run this a dozen of times)
Note that the replacements are fixed strings, so I do not really need those to be regular expressions.
sed
has problems only when the regexp
has globs and the input file is a single large line.
Upvotes: 3
Views: 1243
Reputation: 440142
The perl
equivalent of your sed
command is:
perl -p replacements.txt infile.txt
It should work with your sample replacements.txt
, given that the s
statements are properly ;
-terminated (note that sed
would recognize the end of the line by itself as the statement terminator).
The real problem, however, is that the entire large file is a single line, so the key to avoiding running out of memory is to:
If there's a character in the data that delimits records (units of data), in a away that doesn't interfere with the string replacements, breaking the long line into multiple ones with the help of tr
is a viable approach; I'll use }
as an example, because Kuzeko states that the data is JSON-like:
If you have GNU sed
(Linux; verify with sed --version
):
tr '}' '\0' < infile.txt | sed -z -f replacements.txt | tr '\0' '}'
Having tr
output NUL
-separated "lines" (\0
) and sed
read them accordingly (-z
) is the most robust way to handle the chunking.
Unfortunately, the -z
/ --null-data
option is not POSIX-compliant and the BSD/macOS implementation does not support it.
Otherwise (e.g., on macOS):
tr '}' '\n' < infile.txt | perl -p replacements.txt infile.txt | tr '\n' '}'
Caveat: If the single line in infile.txt
has a trailing \n
, you'll end up with an extra }
char. at the end; to prevent that, add an initial tr
stage to the pipeline that deletes the \n
:
tr -d '\n' < infile.txt | tr '}' '\n' | ...
perl
is still needed, because - unlike BSD/macOS sed
- it preserves the trailing-\n
-or-not status of the input's last line.
Upvotes: 4