Reputation: 65

Print several lines between patterns (first pattern not unique)

Need help with sed/awk/grep/whatever could solve my task. I have a large file and I need to extract multiple sequential lines from it.

I have start pattern: <DN>

and end pattern: </GR>

and several lines in between, like this:

<DN>234</DN>
<DD>sdfsd</DD>
<BR>456456</BR>
<COL>6575675 sdfsd</COL>

<RAC>456464</RAC>
<GR>sdfsdfsFFFDd</GR>

I've tried this:

sed -n '/\<DN\>/,/\<\/GR\>/p'

and several other ones (using awk and sed). It works okay, but the problem is that the source file may contain lines starting with <DN> and without </GR> in the end of the bunch of lines, and then starts a part with another and normal in the end:

<DN>234</DN> - unneded DN
<AB>sdfsd</AB>
<DC>456456</DC>
<EF>6575675 sdfsd</EF>
....really large piece of unwanted text here....

<DN>234</DN>
<DD>sdfsd</DD>
<BR>456456</BR>
<COL>6575675 sdfsd</COL>

<RAC>456464</RAC>
<GR>sdfsdfsFFFDd</GR>
<RAC>456464</RAC>
<GR>sdfsdfsFFFDd</GR>

How can I extract only needed lines and ignore garbage pieces of log, containing <DN> without ending </GR>?

And next, I need to convert a multiline pieces from <DN> to </GR> to a file with single lines, starting with <DN> and ending with </GR>. Any help would be appreciated. I'm stuck

Upvotes: 1

Answers (5)

potong

Reputation: 58371

This might work for you (GNU sed):

sed -n '/<DN>/{h;b};x;/./G;x;/<\/GR/{x;/./p;z;x}' file

Use the hold space to store lines between <DN> and </GR>.

Upvotes: 2

Nahuel Fouilleul

Reputation: 19315

with bash:

fun () 
{ 
    local line output;
    while IFS= read -r line; do
        if [[ $line =~ ^'<DN>' ]]; then
            output=$line;
        else
            if [[ -n $output ]]; then
                output=$output$'\n'$line;
                if [[ $line =~ '</GR>'$ ]]; then
                    echo "$output";
                    output=;
                fi;
            fi;
        fi;
    done
}

fun <file

Upvotes: 1

ooga

Reputation: 15501

awk '
  /^<DN>/ { n = 1 }

  n { lines[n++] = $0 }

  n && /<\/GR>$/ {
    for (i=1; i<n; i++) printf "%s", lines[i]
    print ""
    n = 0
  }
' file

Upvotes: 1

Etan Reisner

Reputation: 80921

awk '
# Lines that start with '<DN>' start our matching.
/^<DN>/ {
    # If we saw a start without a matching end throw everything we've saved away.
    if (dn) {
        d=""
    }
    # Mark being in a '<DN>' element.
    dn=1
    # Save the current line.
    d=$0
    next
}

# Lines that end with '</GR>$' end our matching (but only if we are currently in a match).
dn && /<\/GR>$/ {
    # We aren't in a <DN> element anymore.
    dn=0
    # Print out the lines we've saved and the current line.
    printf "%s%s%s\n", d, OFS, $0
    # Reset our saved contents.
    d=""
    next
}

# If we are in a <DN> element and have saved contents append the current line to the contents (separated by OFS).
dn && d {
    d=d OFS $0
}
' file

Upvotes: 1

Avinash Raj

Reputation: 174696

You could use pcregrep tool for this.

$ pcregrep -o -M '(?s)(?<=^|\s)<DN>(?:(?!<DN>).)*?</GR>(?=\n|$)' file
<DN>234</DN>
<DD>sdfsd</DD>
<BR>456456</BR>
<COL>6575675 sdfsd</COL>

<RAC>456464</RAC>
<GR>sdfsdfsFFFDd</GR>

Upvotes: 0

Print several lines between patterns (first pattern not unique)

Answers (5)

Related Questions