Reputation: 21
I am trying to get all lines between first occurrence of pattern1 and last occurrence of pattern 2 both the patterns are regex
Example code
TEXT
TEXT
[SUN_START]
[SUN_END]
[MON_START]
TEXT
[MON_END]
[TUE_START]
[TUE_END]
[WED_START]
TEXT
[WED_END]
TEXT
TEXT
Output that I am expecting is
[SUN_START]
[SUN_END]
[MON_START]
TEXT
[MON_END]
[TUE_START]
[TUE_END]
[WED_START]
TEXT
[WED_END]
Pattern is XXX_START and XXX_END
What I am got so far is
cat /u01/app/oracle/admin/LNOPP1P/config/dbbackup_LNOPP1P.config | sed -n -e '/[[A-Z][A-Z][A-Z]_START]/,/[[A-Z][A-Z][A-Z]_END]/p'
But this does not keep the line breaks and displays everything together like this
[SUN_START]
[SUN_END]
[MON_START]
TEXT
[MON_END]
[TUE_START]
[TUE_END]
[WED_START]
TEXT
[WED_END]
I also want to make sure that it only matches the line starts with [[A-Z]_START] and same for END
Upvotes: 1
Views: 128
Reputation: 20980
This awk
should work:
awk '/_START\]/{p=1} p{a = a $0 ORS}/_END\]/{printf "%s", a; a="";}' file
Simple logic:
Upvotes: 1
Reputation: 203532
IMHO a two-pass approach without saving the contents in memory is the simplest and most robust:
$ awk '
NR==FNR { if (/\[[A-Z]+_START\]/ && !beg) beg=NR; if (/\[[A-Z]+_END\]/) end=NR; next }
FNR>=beg && FNR<=end
' file file
[SUN_START]
[SUN_END]
[MON_START]
TEXT
[MON_END]
[TUE_START]
[TUE_END]
[WED_START]
TEXT
[WED_END]
Consider using [[:upper:]]
instead of [A-Z]
for portability across locales.
I just saw you had this comment under a different answer:
Is it simple to invert this selection? select everything but the bit selected by this AWK ?
and the answer is "of course", just change the condition at the end of the script:
$ awk '
NR==FNR { if (/\[[A-Z]+_START\]/ && !beg) beg=NR; if (/\[[A-Z]+_END\]/) end=NR; next }
FNR<beg || FNR>end
' file file
TEXT
TEXT
TEXT
TEXT
or keep the original condition but makes it's action "next" and add a default "print" for every other line to hit:
$ awk '
NR==FNR { if (/\[[A-Z]+_START\]/ && !beg) beg=NR; if (/\[[A-Z]+_END\]/) end=NR; next }
FNR>=beg && FNR<=end { next }
{ print }
' file file
TEXT
TEXT
TEXT
TEXT
Upvotes: 0
Reputation: 8164
A solution without awk
, using grep
grep -Pzo '(?s)\[([A-Z]{3})_START\].*\n.*\[\1_END\]' file | sed 's/\x00/\n\n/'
you get,
[SUN_START] [SUN_END] [MON_START] TEXT [MON_END] [TUE_START] [TUE_END] [WED_START] TEXT [WED_END]
*based in @albfan answer
Upvotes: 1
Reputation: 13249
You could use awk
:
awk '/\[..._START\]/{p=1}/\[..._END\]/{print;p=0}p||!NF' file
The variable p
is set when printing is needed.
!NF
allows to keep blank lines.
Upvotes: 0