Suresh Anbarasan
Suresh Anbarasan

Reputation: 1033

Awk command to grep data between two multi-line pattern based on condition

Sample.xml :

` test point lvl3 of id 1 lvl4 of id 1

<tester>
   <level1 id="2"> test point </level1>
   <level2> </level2>
   <level3>lvl3 of id 2 </level3>
   <level4> lvl4 of id 2</level4>
   <level5> </level5>
</tester>

<tester>
   <level1 id="3"> test point </level1>
   <level2> </level2>
   <level3>lvl3 of id 3</level3>
   <level4>lvl4 of id 3</level4>
   <level5> </level5>
</tester>

<tester>
   <level1 id="2"> test point </level1>
   <level2> </level2>
   <level3>lvl3 of id 2 2nd occurance</level3>
   <level4>lvl4 of id 2 2nd occurance</level4>
   <level5> </level5>
</tester>

`
For the above mentioned sample.xml , I need to get the level3 and level4 tag only if the Id in level1 is 2. For eg : I should get the below answer when I search for id=2

<level3>lvl3 of id 2 </level3>
<level4> lvl4 of id 2</level4>

<level3>lvl3 of id 2 2nd occurance</level3>
<level4>lvl4 of id 2 2nd occurance</level4>

Upvotes: 0

Views: 2995

Answers (3)

perreal
perreal

Reputation: 97938

Using sed:

sed -n '/<tester>/{n;/<level1[ ]*id="2"/{n;n;N;p}}' input

Explanation:

sed                  # execute sed
-n                   # do not print unless explicitly stated
/<tester>/           # if this line contains <tester>
{                    # then 
n;                   # skip the line (read new line over the old line)
/<level1[ ]*id="2"/  # if this line contains <level1 [spaces] id="2"
{                    # then
n;n;                 # skip it, and skip the next line
N;                   # read another line but this time append
p                    # print the buffer
}                    # end if
}                    # end if

Upvotes: 2

William Pursell
William Pursell

Reputation: 212228

When working with blocks in awk, it is often convenient to clear RS. I believe this does what you want:

awk '/id="2"/{print ""; split( $0,a,"\n" ); for( i in a) 
    if( match( a[i], "level[34]" )) print(a[i])}' RS= input

Upvotes: 0

Steve
Steve

Reputation: 54392

I'd recommend an xml parser like xmlstarlet. However that's not to say it can't be done using awk. Here's one way. Run like:

awk -f script.awk file

Contents of script.awk:

/<tester>/ {
    r=""
    f=1
}

f && /<level1 id="2">/ {
    g=1
}

g && /<level[34]>/ {
    sub(/^[ \t]+/, "")
    r = r $0 ORS
}

/<\/tester>/ {
    if (g && r) {
        print r
    }
    f=g=0
}

Results:

<level3>lvl3 of id 2 </level3>
<level4> lvl4 of id 2</level4>

<level3>lvl3 of id 2 2nd occurance</level3>
<level4>lvl4 of id 2 2nd occurance</level4>

Alternatively, here's the one-liner:

awk '/<tester>/ { r=""; f=1 } f && /<level1 id="2">/ { g=1 } g && /<level[34]>/ { sub(/^[ \t]+/, ""); r = r $0 ORS } /<\/tester>/ { if (g && r) print r; f=g=0 }' file

Upvotes: 0

Related Questions