j.doe
j.doe

Reputation: 65

How to delete all the lines after the last occurence of pattern?

i want to delete all the lines after the last occurence of pattern except the pattern itself

file.txt

honor
apple
redmi
nokia
apple
samsung
lg
htc

file.txt what i want

honor
apple
redmi
nokia
apple

what i have tried

sed -i '/apple/q' file.txt

this deletes all the line after the first occurence of pattern -

honor

Upvotes: 5

Views: 1842

Answers (7)

potong
potong

Reputation: 58578

This might work for you (GNU sed):

sed '/apple/,$!b;//!H;//{x;//p;x;h};${x;P};d' file

Print as usual any lines that are not from the first appearance of apple to the end of the file. For lines within the above range, append lines that do not contain the word apple to the hold space (HS). Lines that do contain the word apple, first swap to the HS and print any line there if the word apple is there, then replace the HS with the line containing apple. Delete all lines other than the last line. At the endof file print the first line of the HS and delete the remaining lines.

If slurping a large file is not a problem use:

sed -rz 's/(.*apple[^\n]*).*/\1\n/' file

This uses greed to capture all lines before and including the word apple.

Upvotes: 1

Thor
Thor

Reputation: 47239

Given that you are dealing with large input I would go with a two-pass coreutils solution, e.g.:

n=$(grep -Fn apple infile | tail -n1 | cut -d: -f1)
[ -n "$n" ] && head -n$n infile > outfile

This uses grep's fixed string matching (-F) to find every line containing apples. Then head is used to extract the relevant lines.

You did not specify what happens when no apples are found, so this solution does nothing when that occurs.

Upvotes: 0

dawg
dawg

Reputation: 104102

If you don't mind having everything in memory, you can do:

$ awk '/^apple$/{last=NR} 
              {lines[NR]=$0}
     END{for(li=1;li<=last;li++) print lines[li]}' file
honor
apple
redmi
nokia
apple

Upvotes: 0

Ed Morton
Ed Morton

Reputation: 204638

Simple, robust 2-pass approach using almost no memory:

$ awk 'NR==FNR{if (/apple/) hit=NR; next} {print} FNR==hit{exit}' file file
honor
apple
redmi
nokia
apple

If that doesn't execute fast enough THEN it's time to try some alternatives to see if any produce a performance improvement.

Upvotes: 7

karakfa
karakfa

Reputation: 67567

here is another awk without scanning the file twice

$ awk 'f       {buf=buf ORS $0} 
       /apple/ {f=1; if(buf)print buf; buf=$0} 
       !f' file

honor
apple
redmi
nokia
apple

Upvotes: 0

William Pursell
William Pursell

Reputation: 212664

If you don't want to reverse the file as Barmar suggests, you will either have to read the file in reverse using lower level tools (eg, fseek) or read it twice:

sed $(awk '/apple/{a=NR}END{print a+1}' input),\$d input

(Note that if the pattern does not appear in the file, this will output nothing. That's an edge case you should worry about.)

Upvotes: 1

Barmar
Barmar

Reputation: 782693

Reverse the file, print everything starting from the first occurrence of the pattern, then reverse the result:

tac file.txt | sed -n '/apple/,$p' | tac > newfile.txt

You can find the line number of the last match, then use that to print the first N lines of the file:

line=$(awk '/apple/ { line=NR } END {print line}')
head -n $line file.txt > newfile.txt

Upvotes: 5

Related Questions