Reputation: 5978

Remove a single line from a very large file using sed

I am trying to remove a single line (the first match) towards the beginning of a 250GB file. The command i have is this:

sed -i '0,/matchstring/{/matchstring/d;}' file

This works great on a smaller file, but on the big file, it never exits. I understand there is q to tell sed to exit early, but I can't figure out how to add that into what I have here.

Upvotes: 0

Answers (3)

pynexj

Reputation: 20698

Since OP mentioned in the comment that

... something that replaces it with blanks works great, the line doesn't need to be removed.

And the following solution assumes the line is near to the begining of the file ^{(otherwise we need a bit more tricks to get the exact offset of the line and the intermediate "header" file can be avoided)}.

[STEP 101] $ cat file
hello 1
hello 2
hello 3
foo matchstring bar
hello 4
hello 5
hello 6
[STEP 102] $ sed '/matchstring/{s/./ /g;q;}' file > header
[STEP 103] $ cat header
hello 1
hello 2
hello 3

[STEP 104] $ dd conv=notrunc if=header of=file
0+1 records in
0+1 records out
44 bytes copied, 0.000892102 s, 49.3 kB/s
[STEP 105] $ cat file
hello 1
hello 2
hello 3

hello 4
hello 5
hello 6
[STEP 106] $

And if it works by converting the matched line into a comment line, just change s/./ /g to, for example, s/./#/ in the sed command.

Upvotes: 2

tshiono

Reputation: 22022

If @pynexj's idea works, how about a perl approach which replaces the target line with blanks of the same length without changing the total file size.

perl -ne '
use strict;
use warnings;

my $file = $ARGV;                       # filename specified at the last line
open(FH, "+< $file") or die "$file";    # open the file with "rw" mode
while (<FH>) {
    if (/matchstring/) {                # if the target string is found
        my $pos = tell(FH);             # get the current position (start of the next line)
        $pos -= length;                 # rewind to the start of the target line
        seek(FH, $pos, 0);              # update the file pointer
        my $spaces = " " x (length($_) - 1) . "\n";
                                        # generate string of whitespaces of the same length
        print FH $spaces;               # overwrite the current line with the whitespaces
        close(FH);                      # close the file
        exit;                           # exit the script
    }
}
' "file"

[Update]
I have benchmarked the performance of the perl script by generating a 100GB file containing the target string in the halfway of the file. It completed in 15 minutes on my 10 year old laptop equipped with 2.5" HDD (not SSD). Modern machines will run much more faster.

Upvotes: 2

KamilCuk

Reputation: 141040

I can't figure out how to add that into what I have here.

Just replace d with q...

sed '0,/matchstring/{/matchstring/q}'

Because q is exiting, and you are matching from the first line, you could just:

sed '/matchstring/q'

Och, maybe you do not want the line with matchstring in the output. Then you could print the line, but if it's matchstring then quit:

sed -n '/matchstring/q;p'

Upvotes: 1

Remove a single line from a very large file using sed

Answers (3)

Related Questions