Extract specific XMLs from log file

Question

I have large log files (around 50mb each), which contain java debug information plus all kinds of XML responses

Here's an example of something I'm trying to extract from the log

I need only the XMLs which the uniqueid attribute contains "12345" and the status attribute is set to "Activated"

By using "sed" I'm able to extract all the envelopes, and currently I'm using regex to check if the above conditions exist inside of it (by running all of them in a loop).

sed -n '//,/<\/envelope>/p' logfile

What would be a proper solution to extract what I need from the file?

Thanks!

karakfa · Accepted Answer

assuming your xml is formatted as shown, this should work...

$ awk '// {line=$0; p=0; next} 
             line   {line=line ORS $0} 
    /uniqueid/ && $3~/12345/ {p=1}
   // && p {print line}' file

with the opening tag, start accumulating the lines, if the desired line found set the flag, with the end tag if the flag is set print the record.

with gawk you can do this instead

$ awk -F'
' -v RS='
'    \
    '$3~/uniqueid.*12345/ && $4~/status.*Activated/{print $0, RT}' file

there will be an extra newline though.

Extract specific XMLs from log file

Answers (1)

Related Questions