How can i remove multiple lines from a file based on a pattern that spans multiple lines?

Question

I have a text formatted like the following:

2020-05-02
apple
string
string
string
string
string
2020-05-03
pear
string
string
string
string
string
2020-05-03
apple
string
string
string
string
string

Each group has 7 lines = Date, Fruit and then 5 strings.

I would like to delete groups of 7 lines from the file by supplying just the date and the fruit.

So if choose '2020-05-03' and 'pear'

this would remove:

2020-05-03
pear
string
string
string
string
string

from the file, resulting in this:

2020-05-02
apple
string
string
string
string
string
2020-05-03
apple
string
string
string
string
string

The file contains thousands of lines, I need a command, probably using sed or awk to:

Search for date 2020-05-03
Check if string after date is pear
delete both lines and following 5 lines

I know i can delete with sed like sed s'/string//g', however i am not sure if i can delete multiple lines.

Note: Date followed by fruit is never repeated twice so

2020-05-02
pear

would only occur once in the file

How can i acheive this?

anubhava · Accepted Answer

Using awk, you may do this:

awk -v dt='2020-05-03' -v ft='pear' '$1==dt{p=NR} p && NR==p+1{del=($1==ft)}
del && NR<=p+6{next} 1' file

2020-05-02
apple
string
string
string
string
string
2020-05-03
apple
string
string
string
string
string

Explanation:

-v dt='2020-05-03' -v ft='pear': Supply 2 values to awk from command line
$1==dt{p=NR}: If we find a line with matching date then store line no in variable p
p && NR==p+1{del=($1==ft)}: If p>0 and we are at next line then set a flag del to 1 if we have matching fruit name otherwise set that flag to 0.
del && NR<=p+6{next}: If flag del is set then skip next 6 lines
1: Default action to print line

How can i remove multiple lines from a file based on a pattern that spans multiple lines?

Answers (2)

Related Questions