Wang
Wang

Reputation: 8173

match multiline pattern non-greedy with sed?

infile:

[start] cmd1
afadfadf
dafdf
[ok] cmd1
[-] cmd2
[-] cmd3
[start] cmd4
dfdafadf
d
afasdf

daf
[stop] cmd4
[-] cmd5
[-] cmd6
[start] cmd1
adfadd
dafa
dfdd33r55ae
[ok] cmd1
[-] cmd7
[start] cmd8
error...

[stop] cmd8
[-] cmd9
[start] cmd10
exit xx

[stop] cmd10
[-] cmd
[start] cmd1
[ok] cmd1

I would like to print all blocks like: [start] ... [stop] cmd...

result should be:

[start] cmd4
dfdafadf
d
afasdf

daf
[stop] cmd4
[start] cmd8
error...

[stop] cmd8
[start] cmd10
exit xx

[stop] cmd10

how can I do this with sed?

sed -n '/\[start\]/I,/\[stop\]/I p' will not work, because the range operator would not stop until it find the next [stop].

EDIT: after using @jaybee sed script I found it still have some issue, when there is more [stop] line than start line, for example:

infile2

[start] cmd1
afadfadf
dafdf
[ok] cmd1
[-] cmd2
[-] cmd3
[start] cmd4
dfdafadf
d
afasdf

daf
[stop] cmd4
[-] cmd5
[-] cmd6
[start] cmd1
adfadd
dafa
dfdd33r55ae
[ok] cmd1
[-] cmd7
[start] cmd8
error...

[stop] cmd8
[-] cmd9
[stop] sum
[stop] cmd1
[stop] cmd2
[start] cmd10
exit xx

[stop] cmd10
[-] cmd
[start] cmd1
[ok] cmd1

It will still output the extra [stop] lines, like this:

[start] cmd4
dfdafadf
d
afasdf

daf
[stop] cmd4
[start] cmd8
error...

[stop] cmd8
[stop] cmd8
[-] cmd9
[stop] sum
[stop] sum
[stop] cmd1
[stop] cmd2
[stop] cmd2
[start] cmd10
exit xx

[stop] cmd10

so I decide to modify the sedsrc to fix this:

#n
/^\[start\]/I {h;d}
#if match [start] create a new hold buffer then delete the pattern space
/^\[stop\]/I {
#if match [stop] do this
H;x
#append line into hold buffer and then swap the hold buffer to pattern space
/^\[start\]/I{p;d}
#if the buffer contain [start], then it is a complete [start]...[stop] block, print the block,start over with next line
d
#if does not contain [start],start over with next line
}
/^\[.+\]/ {
#if it is other control word, do this
h;d
# clear and put current line to hold buffer, start over with next line
}

H
#append non-control line into hold buffer

now it works OK, welcome for future discussion about how to make the script more concise.

Upvotes: 3

Views: 1663

Answers (3)

potong
potong

Reputation: 58420

This might work for you (GNU sed):

sed ':a;/^\[start\|stop\]/I{:b;$!{n;/^\[/ba;bb}};d' file

If the line begins [start] or [stop] (upper or lowercase), print it and any subsequent lines that don't begin [. If the next line begins with a [ loop back to the beginning and check again, otherwise delete it.

EDIT:

An alternate answer might be:

sed '/^\[start\]/I{h;d};H;/^\[stop\]/I{x;p;x};d' file

EDIT:

In light of the question being amended:

sed '/^\[start\]/I{:a;x;/^\(.*\[stop\][^\n]*\).*/Is//\1/p;x;h;d};H;$ba;d' file

Upvotes: 0

Jotne
Jotne

Reputation: 41456

Here is an awk that should work:

awk '/^\[start\]/ {i=1;delete a}  {a[i++]=$0} /^\[stop\]/ {for (j=1;j<i;j++) print a[j]}' file
[start] cmd4
dfdafadf
d
afasdf

daf
[stop] cmd4
[start] cmd1
adfadd
dafa
dfdd33r55ae
[stop] cmd1
[start] cmd8
error...

[stop] cmd8
[start] cmd10
exit xx

[stop] cmd10

It only print from start if it has an end
Every time it sees a start it resets an array and start storing data to it.
If a stop is found, print out the array.

Upvotes: 3

jaybee
jaybee

Reputation: 955

Ok, so I suggest you use the hold buffer, flushing it whenever you see a fresh [start] and printing it whenever you see [stop]. This gives the following script:

#n
/^\[start\]/I {
    h;n
}
/^\[stop\]/I {
    H;x;p;n
}
H

You put this in e.g. sedscr and then run it to get the following result:

$ sed -f sedscr infile
[start] cmd4
dfdafadf
d
afasdf

daf
[stop] cmd4
[start] cmd1
adfadd
dafa
dfdd33r55ae
[stop] cmd1
[start] cmd8
error...

[stop] cmd8
[start] cmd10
exit xx

[stop] cmd10

Explanation

Seeing [start] at the beginning of the line (with the I flag as it seems you want to go case-insensitive), put that line into the hold space, erasing its previous content (h) and then feed in the next line (n).

When one sees [stop], append that line to the hold space (H), then swap pattern space and hold space (x) to print the pattern space (p), and then feed in the next line (n).

And on all other lines, simply append the line to the current hold space (H).

By the way, the #n at the beginning of my script is the equivalent of -n on the commandline: requesting sed not to output the pattern space to the output stream unless asked by a p command.

Upvotes: 7

Related Questions