Dunams
Dunams

Reputation: 112

Extract specific XMLs from log file

I have large log files (around 50mb each), which contain java debug information plus all kinds of XML responses

Here's an example of something I'm trying to extract from the log

<envelope>
    <response>
        <ATTR name="uniqueid" value="XYZ_00000-00-00_12345_1"/>
        <ATTR name="status" value="Activated"/>
        <ATTR name="datecreated" value="2018/10/04 09:39:05"/>
    </response>
</envelope>

I need only the XMLs which the uniqueid attribute contains "12345" and the status attribute is set to "Activated"

By using "sed" I'm able to extract all the envelopes, and currently I'm using regex to check if the above conditions exist inside of it (by running all of them in a loop).

sed -n '/<envelope>/,/<\/envelope>/p' logfile

What would be a proper solution to extract what I need from the file?

Thanks!

Upvotes: 0

Views: 582

Answers (1)

karakfa
karakfa

Reputation: 67557

assuming your xml is formatted as shown, this should work...

$ awk '/<envelope>/ {line=$0; p=0; next} 
             line   {line=line ORS $0} 
    /uniqueid/ && $3~/12345/ {p=1}
   /<\/envelope>/ && p {print line}' file

with the opening tag, start accumulating the lines, if the desired line found set the flag, with the end tag if the flag is set print the record.

with gawk you can do this instead

$ awk -F'\n' -v RS='</envelope>\n'    \
    '$3~/uniqueid.*12345/ && $4~/status.*Activated/{print $0, RT}' file

there will be an extra newline though.

Upvotes: 1

Related Questions