Reputation: 5978
I am trying to remove a single line (the first match) towards the beginning of a 250GB file. The command i have is this:
sed -i '0,/matchstring/{/matchstring/d;}' file
This works great on a smaller file, but on the big file, it never exits. I understand there is q
to tell sed to exit early, but I can't figure out how to add that into what I have here.
Upvotes: 0
Views: 696
Reputation: 20698
Since OP mentioned in the comment that
... something that replaces it with blanks works great, the line doesn't need to be removed.
And the following solution assumes the line is near to the begining of the file (otherwise we need a bit more tricks to get the exact offset of the line and the intermediate "header" file can be avoided).
[STEP 101] $ cat file
hello 1
hello 2
hello 3
foo matchstring bar
hello 4
hello 5
hello 6
[STEP 102] $ sed '/matchstring/{s/./ /g;q;}' file > header
[STEP 103] $ cat header
hello 1
hello 2
hello 3
[STEP 104] $ dd conv=notrunc if=header of=file
0+1 records in
0+1 records out
44 bytes copied, 0.000892102 s, 49.3 kB/s
[STEP 105] $ cat file
hello 1
hello 2
hello 3
hello 4
hello 5
hello 6
[STEP 106] $
And if it works by converting the matched line into a comment line, just change s/./ /g
to, for example, s/./#/
in the sed
command.
Upvotes: 2
Reputation: 22022
If @pynexj's idea works, how about a perl
approach which replaces the target line with blanks of the same length without changing the total file size.
perl -ne '
use strict;
use warnings;
my $file = $ARGV; # filename specified at the last line
open(FH, "+< $file") or die "$file"; # open the file with "rw" mode
while (<FH>) {
if (/matchstring/) { # if the target string is found
my $pos = tell(FH); # get the current position (start of the next line)
$pos -= length; # rewind to the start of the target line
seek(FH, $pos, 0); # update the file pointer
my $spaces = " " x (length($_) - 1) . "\n";
# generate string of whitespaces of the same length
print FH $spaces; # overwrite the current line with the whitespaces
close(FH); # close the file
exit; # exit the script
}
}
' "file"
[Update]
I have benchmarked the performance of the perl
script by generating
a 100GB file containing the target string in the halfway of the file.
It completed in 15 minutes on my 10 year old laptop equipped with 2.5" HDD
(not SSD). Modern machines will run much more faster.
Upvotes: 2
Reputation: 141040
I can't figure out how to add that into what I have here.
Just replace d
with q
...
sed '0,/matchstring/{/matchstring/q}'
Because q
is exiting, and you are matching from the first line, you could just:
sed '/matchstring/q'
Och, maybe you do not want the line with matchstring
in the output. Then you could print the line, but if it's matchstring
then quit:
sed -n '/matchstring/q;p'
Upvotes: 1