Reputation: 29
On SUSE Linux, I'd like to find complete section between a BEGIN string and END string from a text file. I thought about using sed or awk.
Optionally, I would like to search for the next occurrence in another run.
My challenge is:
Example
something before ----BEGIN
first paragraph
Text Text Text
Text Text Text
Text Text Text
no ending pattern
something before ----BEGIN
second paragraph
Text Text Text
Text Text Text
Text Text Text
END---- some more text
no beginning pattern
Text Text Text
Text Text Text
END---- some more text
something before ----BEGIN
third paragraph
Text Text Text
Text Text Text
Text Text Text
no ending pattern
something before ----BEGIN
fourth paragraph
Text Text Text
Text Text Text
Text Text Text
END---- some more text
Text Text Text
I expect something like this:
----BEGIN
second paragraph
Text Text Text
Text Text Text
Text Text Text
END----
In another run I'd like to find the next complete section:
----BEGIN
fourth paragraph
Text Text Text
Text Text Text
Text Text Text
END----
In forums I could already find something like this:
tac < file.txt | sed '/END-----/,$!d;/-----BEGIN/q' | tac
But it finds only the last occurrence and doesn't cut the characters at the beginning and the end.
Unfortunately I'm not that experienced in using sed/awk or regex. I would appreciate if you could give me some guidance!
Cheers, erd
Upvotes: 2
Views: 226
Reputation: 50750
Buffer lines between BEGIN
and END
discarding the buffer whenever BEGIN
happens to occur, and print the buffer upon reaching END
. Note that this assumes there's always a space before ----BEGIN
, and after END----
.
awk '/BEGIN$/,/^END/ {
if(/BEGIN$/) {
buf=$NF
}
else if(/^END/) {
print buf
print $1
}
else {
buf=(buf ORS $0)
}
}' file
Upvotes: 1
Reputation: 58351
This might work for you (GNU sed &bash):
b='----BEGIN' e='END----' n=1
sed -En '/'$b'/{:a;N;/'$e'/!ba;x;s/^/x/;/^x{'$n'}$/!{x;b};x;s/.*('$b'.*'$e').*/\1/p}' file
This gathers up lines between ----BEGIN
and END----
and then uses greed to find the last occurrence of ----BEGIN
in the resulting string. The number of the result strings presented as results can be determined by the n
variable (in the example above it is the first). An example solution for the second would be as so:
b='----BEGIN' e='END----' n=2
sed -En '/'$b'/{:a;N;/'$e'/!ba;x;s/^/x/;/^x{'$n'}$/!{x;b};x;s/.*('$b'.*'$e').*/\1/p}' file
Upvotes: 0
Reputation: 67467
it looks like the BEGIN/END markers are not reliable and you depend on empty lines between records, which is supported by awk
record mode.
$ awk -v n=2 -v RS= 'BEGIN {b="BEGIN"; e="END"; h="----"; s=".*"}
NR==n {sub(s h b, h b);
sub(e h s, e h);
print}' file
----BEGIN
second paragraph
Text Text Text
Text Text Text
Text Text Text
END----
Upvotes: 1
Reputation: 203169
$ cat tst.awk
BEGIN { beg="----BEGIN"; end="END----" }
sub(".*"beg,beg) { inBlock=1; buf="" }
inBlock {
buf = buf $0 ORS
if ( sub(end".*",end,buf) ) {
print buf ORS
inBlock=0
}
}
$ awk -f tst.awk file
----BEGIN
second paragraph
Text Text Text
Text Text Text
Text Text Text
END----
----BEGIN
fourth paragraph
Text Text Text
Text Text Text
Text Text Text
END----
Upvotes: 4
Reputation: 212198
It's not entirely clear if this will work, but making several assumptions based on the sample input, you might simply try:
awk '/BEGIN/ && /END/' RS= ORS='\n\n' input
That will filter out the records you want (again, I'm making assumptions about what you actually want based on the input sample), and then you can easily select records with a second awk. For example, to get the nth record, you can do something like:
N=2; awk '/BEGIN/ && /END/' RS= ORS='\n\n' input | awk 'NR==n' n=$N RS=
Put that in a loop with N as the loop counter and you have everything that you (seem to) want.
Upvotes: 1