John
John

Reputation: 43

Extracting lines between two patterns and including line above the first and below the second

Having the following text file, I need to extract and print strings between two patterns and ,also, include the line above the first pattern and the one following the second

asdgs sdagasdg sdagdsag
asdfgsdagg gsfagsaf 
asdfsdaf dsafsdfdsfas
asdfdasfadf
nnnn nnnnn aaaaa
line before first pattern
***** FIRST *****
dddd ffff cccc
wwww rrrrrrrr xxxx
***** SECOND *****
line after second pattern
asdfgsdagg gsfagsaf 
asdfsdaf dsafsdfdsfas
asdfdasfadf
nnnn nnnnn aaaaa

I have found many solution with sed and awk to extract between two tags as the following

sed -n '/FIRST/,/SECOND/p' FileName

but how to include the line before and after the pattern?

Desired output:

line before first pattern
***** FIRST *****
dddd ffff cccc
wwww rrrrrrrr xxxx
***** SECOND *****
line after second pattern

Upvotes: 3

Views: 2285

Answers (7)

Tom Fenech
Tom Fenech

Reputation: 74615

As you've asked for an sed/awk solution (and everyone is scared of ed ;-), here's one way you can do it in awk:

awk '/FIRST/{print p; f=1} {p=$0} /SECOND/{c=1} f; c--==0{f=0}' file

When the first pattern is matched, print the previous line p and set the print flag f. When the second pattern is matched set c to 1. If f is 1 (true), the current line will be printed. c--==0 is only true the line after the second pattern is matched.

Another way you can do this is by looping through the file twice:

awk 'NR==FNR{if(/FIRST/)s=NR;else if(/SECOND/)e=NR;next}FNR>=s-1&&FNR<=e+1' file file

The first pass through the file loops through the file and records the line numbers. The second prints the lines in the range.

The advantage of the second approach is that it is trivially easy to print M lines before and N lines after the range, simply by changing the numbers in the script.

To use shell variables instead of hard-coded patterns, you can pass the variables like this:

awk -v first="$first" -v second="$second" '...' file

Then use $0 ~ first instead of /FIRST/.

Upvotes: 3

potong
potong

Reputation: 58420

This might work for you (GNU sed):

sed '/FIRST/!{h;d};H;g;:a;n;/SECOND/{n;q};$!ba' file

If the current line is not FIRST save it in the hold space and delete the current line. If the line is FIRST append it to the saved line and then print both and any further lines untill SECOND when an additional line is printed and the script exited.

Upvotes: 0

Ed Morton
Ed Morton

Reputation: 203502

This will work whether or not there's multiple ranges in your file:

$ cat tst.awk
/FIRST/ { print prev; gotBeg=1 }
gotBeg {
    print
    if (gotEnd)   gotBeg=gotEnd=0
    if (/SECOND/) gotEnd=1
}
{ prev=$0 }

$ awk -f tst.awk file
line before first pattern
***** FIRST *****
dddd ffff cccc
wwww rrrrrrrr xxxx
***** SECOND *****
line after second pattern

If you ever need to print more than 1 line before FIRST change prev to an array. If you ever need to print more than 1 line after SECOND, change gotEnd to a count.

Upvotes: 1

Sobrique
Sobrique

Reputation: 53478

I would do it with Perl personally. We have the 'range operator' which we can use to detect if we're between two patterns:

if ( m/FIRST/ .. /SECOND/ ) 

That's the easy part. What's a little less easy is 'catching' the preceeding and next lines. So I set a $prev_line value, so that when I first hit that test, I know what to print. And I clear that $prev_line, both because then it's empty when I print it again, but also because then I can spot the transition at the end of the range.

So something like this:

#!/usr/bin/perl

use strict;
use warnings;

my $prev_line = " ";
while (<DATA>) {
    if ( m/FIRST/ .. /SECOND/ ) {
        print $prev_line;
        $prev_line = '';
        print;
    }
    else {
        if ( not $prev_line ) {
            print;
        }
        $prev_line = $_;
    }
}

__DATA__ 
asdgs sdagasdg sdagdsag
asdfgsdagg gsfagsaf 
asdfsdaf dsafsdfdsfas
asdfdasfadf
nnnn nnnnn aaaaa
line before first pattern
***** FIRST *****
dddd ffff cccc
wwww rrrrrrrr xxxx
***** SECOND *****
line after second pattern
asdfgsdagg gsfagsaf 
asdfsdaf dsafsdfdsfas
asdfdasfadf
nnnn nnnnn aaaaa

Upvotes: 0

Dieselist
Dieselist

Reputation: 945

based on the Tom's comment: if the file isn't large we can just store it in the array, and then loop over it:

awk '{a[++i]=$0} /FIRST/{s=NR} /SECOND/{e=NR} END {for(i=s-1;i<e+1;i++) print a[i]}'

Upvotes: 0

NeronLeVelu
NeronLeVelu

Reputation: 10039

sed '#n
   H;$!d
   x;s/\n/²/g
   /FIRST.*SECOND/!b
   s/.*²\([^²]*²[^²]*FIRST\)/\1/
:a
   s/\(FIRST.*SECOND[^²]*²[^²]*\)².\{1,\}/\1/
   ta
   s/²/\
/g
   p' YourFile
  • POSIX sed version (GNU sed use --posix)
  • take the following SECOND pattern also if on the same line, easy to adapt for taking at least one new line between
    • #n : don't print unless expres request (like p)
    • H;$!d : append each line to buffer, if not last line, delete current line and loop
    • x;s/\n/²/g : load buffer and replace any new line with another character (here i use ²) because posix sed does not allow a [^\n]
    • /FIRST.*SECOND/!b : if no pattern presence, quit without output
    • s/.*²\([^²]*²[^²]*FIRST\)/\1/ : remove everything before line before your first pattern
    • :a : label for a goto (used later)
    • s/\(FIRST.*SECOND[^²]*²[^²]*\)².\{1,\}/\1/ : remove everything after a line after your second pattern. It take the biggest string so last occurence of the pattern is the reference
    • ta : if last s/// occur, got to label a. It cyle, until first SECOND pattern occuring in file (after FIRST)
    • s/²/\ /g : put back the new lines
    • p : print the result

Upvotes: 0

Wintermute
Wintermute

Reputation: 44043

I'd say

sed '/FIRST/ { x; G; :a n; /SECOND/! ba; n; q; }; h; d' filename

That is:

/FIRST/ {        # If a line matches FIRST
  x              # swap hold buffer and pattern space,
  G              # append hold buffer to pattern space.
                 # We saved the last line before the match in the hold
                 # buffer, so the pattern space now contains the previous
                 # and the matching line.
  :a             # jump label for looping
  n              # print pattern space, fetch next line.
  /SECOND/! ba   # unless it matches SECOND, go back to :a
  n              # fetch one more line after the match
  q              # quit (printing that last line in the process)
}
h                # If we get here, it's before the block. Hold the current
                 # line for later use.
d                # don't print anything.

Note that BSD sed (as comes with Mac OS X and *BSD) is a bit picky about branching commands. If you're working on one of those platforms,

sed -e '/FIRST/ { x; G; :a' -e 'n; /SECOND/! ba' -e 'n; q; }; h; d' filename

should work.

Upvotes: 2

Related Questions