Reputation: 43
Having the following text file, I need to extract and print strings between two patterns and ,also, include the line above the first pattern and the one following the second
asdgs sdagasdg sdagdsag
asdfgsdagg gsfagsaf
asdfsdaf dsafsdfdsfas
asdfdasfadf
nnnn nnnnn aaaaa
line before first pattern
***** FIRST *****
dddd ffff cccc
wwww rrrrrrrr xxxx
***** SECOND *****
line after second pattern
asdfgsdagg gsfagsaf
asdfsdaf dsafsdfdsfas
asdfdasfadf
nnnn nnnnn aaaaa
I have found many solution with sed and awk to extract between two tags as the following
sed -n '/FIRST/,/SECOND/p' FileName
but how to include the line before and after the pattern?
Desired output:
line before first pattern
***** FIRST *****
dddd ffff cccc
wwww rrrrrrrr xxxx
***** SECOND *****
line after second pattern
Upvotes: 3
Views: 2285
Reputation: 74615
As you've asked for an sed
/awk
solution (and everyone is scared of ed
;-), here's one way you can do it in awk:
awk '/FIRST/{print p; f=1} {p=$0} /SECOND/{c=1} f; c--==0{f=0}' file
When the first pattern is matched, print the previous line p
and set the print flag f
. When the second pattern is matched set c
to 1. If f
is 1 (true), the current line will be printed. c--==0
is only true the line after the second pattern is matched.
Another way you can do this is by looping through the file twice:
awk 'NR==FNR{if(/FIRST/)s=NR;else if(/SECOND/)e=NR;next}FNR>=s-1&&FNR<=e+1' file file
The first pass through the file loops through the file and records the line numbers. The second prints the lines in the range.
The advantage of the second approach is that it is trivially easy to print M lines before and N lines after the range, simply by changing the numbers in the script.
To use shell variables instead of hard-coded patterns, you can pass the variables like this:
awk -v first="$first" -v second="$second" '...' file
Then use $0 ~ first
instead of /FIRST/
.
Upvotes: 3
Reputation: 58420
This might work for you (GNU sed):
sed '/FIRST/!{h;d};H;g;:a;n;/SECOND/{n;q};$!ba' file
If the current line is not FIRST
save it in the hold space and delete the current line. If the line is FIRST
append it to the saved line and then print both and any further lines untill SECOND
when an additional line is printed and the script exited.
Upvotes: 0
Reputation: 203502
This will work whether or not there's multiple ranges in your file:
$ cat tst.awk
/FIRST/ { print prev; gotBeg=1 }
gotBeg {
print
if (gotEnd) gotBeg=gotEnd=0
if (/SECOND/) gotEnd=1
}
{ prev=$0 }
$ awk -f tst.awk file
line before first pattern
***** FIRST *****
dddd ffff cccc
wwww rrrrrrrr xxxx
***** SECOND *****
line after second pattern
If you ever need to print more than 1 line before FIRST change prev
to an array. If you ever need to print more than 1 line after SECOND, change gotEnd
to a count.
Upvotes: 1
Reputation: 53478
I would do it with Perl personally. We have the 'range operator' which we can use to detect if we're between two patterns:
if ( m/FIRST/ .. /SECOND/ )
That's the easy part. What's a little less easy is 'catching' the preceeding and next lines. So I set a $prev_line
value, so that when I first hit that test, I know what to print. And I clear that $prev_line
, both because then it's empty when I print it again, but also because then I can spot the transition at the end of the range.
So something like this:
#!/usr/bin/perl
use strict;
use warnings;
my $prev_line = " ";
while (<DATA>) {
if ( m/FIRST/ .. /SECOND/ ) {
print $prev_line;
$prev_line = '';
print;
}
else {
if ( not $prev_line ) {
print;
}
$prev_line = $_;
}
}
__DATA__
asdgs sdagasdg sdagdsag
asdfgsdagg gsfagsaf
asdfsdaf dsafsdfdsfas
asdfdasfadf
nnnn nnnnn aaaaa
line before first pattern
***** FIRST *****
dddd ffff cccc
wwww rrrrrrrr xxxx
***** SECOND *****
line after second pattern
asdfgsdagg gsfagsaf
asdfsdaf dsafsdfdsfas
asdfdasfadf
nnnn nnnnn aaaaa
Upvotes: 0
Reputation: 945
based on the Tom's comment: if the file isn't large we can just store it in the array, and then loop over it:
awk '{a[++i]=$0} /FIRST/{s=NR} /SECOND/{e=NR} END {for(i=s-1;i<e+1;i++) print a[i]}'
Upvotes: 0
Reputation: 10039
sed '#n
H;$!d
x;s/\n/²/g
/FIRST.*SECOND/!b
s/.*²\([^²]*²[^²]*FIRST\)/\1/
:a
s/\(FIRST.*SECOND[^²]*²[^²]*\)².\{1,\}/\1/
ta
s/²/\
/g
p' YourFile
--posix
)#n
: don't print unless expres request (like p
)H;$!d
: append each line to buffer, if not last line, delete current line and loopx;s/\n/²/g
: load buffer and replace any new line with another character (here i use ²
) because posix sed does not allow a [^\n]
/FIRST.*SECOND/!b
: if no pattern presence, quit without outputs/.*²\([^²]*²[^²]*FIRST\)/\1/
: remove everything before line before your first pattern:a
: label for a goto (used later)s/\(FIRST.*SECOND[^²]*²[^²]*\)².\{1,\}/\1/
: remove everything after a line after your second pattern. It take the biggest string so last occurence of the pattern is the referenceta
: if last s///
occur, got to label a
. It cyle, until first SECOND pattern occuring in file (after FIRST)s/²/\
/g
: put back the new linesp
: print the resultUpvotes: 0
Reputation: 44043
I'd say
sed '/FIRST/ { x; G; :a n; /SECOND/! ba; n; q; }; h; d' filename
That is:
/FIRST/ { # If a line matches FIRST
x # swap hold buffer and pattern space,
G # append hold buffer to pattern space.
# We saved the last line before the match in the hold
# buffer, so the pattern space now contains the previous
# and the matching line.
:a # jump label for looping
n # print pattern space, fetch next line.
/SECOND/! ba # unless it matches SECOND, go back to :a
n # fetch one more line after the match
q # quit (printing that last line in the process)
}
h # If we get here, it's before the block. Hold the current
# line for later use.
d # don't print anything.
Note that BSD sed (as comes with Mac OS X and *BSD) is a bit picky about branching commands. If you're working on one of those platforms,
sed -e '/FIRST/ { x; G; :a' -e 'n; /SECOND/! ba' -e 'n; q; }; h; d' filename
should work.
Upvotes: 2