Reputation: 13
I'm trying to substitute only the first match of one string in a huge file with only one line (2.1 GB), this substitution will occur in a shell script job. The big problem is that the machine that will run this script has only 1GB memory (approximately
300MB free), so i need a buffered strategy that don't overflow my memory. I already tried sed
, perl
and a python
approach, but all of them return me out of memory errors.
Here are my attemps (discovered in other questions):
# With perl
perl -pi -e '!$x && s/FROM_STRING/TO_STRING/ && ($x=1)' file.txt
# With sed
sed '0,/FROM_STRING/s//TO_STRING/' file.txt > file.txt.bak
# With python (in a custom script.py file)
for line in fileinput.input('file.txt', inplace=True):
print line.replace(FROM_STRING, TO_STRING, 1)
break
One good point is that the FROM_STRING
that i'm searching is always in the beggining of this huge 1-line file, in the first 100 characters. Other good thing is that the execution time is not a problem, it can take time without problems.
EDIT (SOLUTION):
I tested three solutions of the answers, all them solved the problem, thanks for all of you. I tested the performance with Linux time
and all of them take about the same time too, up to 10 seconds approximately... But i choose the @Miller solution because it's simpler (just uses perl
).
Upvotes: 1
Views: 390
Reputation: 385655
sysread
to read large blocks,binmode
isn't needed[2],I'd use
( head -c 100 | perl -0777pe's/.../.../' && cat ) <file.old >file.new
Upvotes: 1
Reputation: 1826
Since you know that your string is always in the first chunk of the file, you should use dd
for this.
You'll also need a temporary file to work with, as in tmpfile="$(mktemp)"
First, copy the first block of the file to a new, temporary location:
dd bs=32k count=1 if=file.txt of="$tmpfile"
Then, do your substitution on that block:
sed -i 's/FROM_STRING/TO_STRING/' "$tmpfile"
Next, concatenate the new first block with the rest of the old file, again using dd
:
dd bs=32k if=file.txt of="$tmpfile" seek=1 skip=1
EDIT: As per Mark Setchell's suggestion, I have added a specification of bs=32k
to these commands to speed up the pace of the dd
operations. This is tunable, per your needs, but if tuning separate commands distinctly, you may need to be careful about the changes in semantics between different input and output block sizes.
Upvotes: 6
Reputation: 75896
A practical (not very compact but efficient) would be to split the file, do the search-replace and join: eg:
head -c 100 myfile | sed 's/FROM/TO/' > output.1
tail -c +101 myfile > output.2
cat output.1 output.2 > output && /bin/rm output.1 output.2
Or, in one line:
( ( head -c 100 myfile | sed 's/FROM/TO/' ) && (tail -c +101 myfile ) ) > output
Upvotes: 0
Reputation: 35198
If you're certain the string you're trying to replace is just in the first 100 characters, then the following perl one-liner should work:
perl -i -pe 'BEGIN {$/ = \1024} s/FROM_STRING/TO_STRING/ .. undef' file.txt
Switches:
-i
: Edit <>
files in place (makes backup if extension supplied)-p
: Creates a while(<>){...; print}
loop for each “line” in your input file. -e
: Tells perl
to execute the code on command line. Code:
BEGIN {$/ = \1024}
: Set the $INPUT_RECORD_SEPARATOR to the number of characters to read for each “line”s/FROM/TO/ .. undef
: Use a flip-flop to perform the regex only once. Could also have used if $. == 1
.Upvotes: 2
Reputation: 98388
Untested, but I would do:
perl -pi -we 'BEGIN{$/=\65536} s/FROM_STRING/TO_STRING/ if 1..1' file.txt
to read in 64k chunks.
Upvotes: 0