Reputation: 112
I have large log files (around 50mb each), which contain java debug information plus all kinds of XML responses
Here's an example of something I'm trying to extract from the log
<envelope>
<response>
<ATTR name="uniqueid" value="XYZ_00000-00-00_12345_1"/>
<ATTR name="status" value="Activated"/>
<ATTR name="datecreated" value="2018/10/04 09:39:05"/>
</response>
</envelope>
I need only the XMLs which the uniqueid attribute contains "12345" and the status attribute is set to "Activated"
By using "sed" I'm able to extract all the envelopes, and currently I'm using regex to check if the above conditions exist inside of it (by running all of them in a loop).
sed -n '/<envelope>/,/<\/envelope>/p' logfile
What would be a proper solution to extract what I need from the file?
Thanks!
Upvotes: 0
Views: 582
Reputation: 67557
assuming your xml is formatted as shown, this should work...
$ awk '/<envelope>/ {line=$0; p=0; next}
line {line=line ORS $0}
/uniqueid/ && $3~/12345/ {p=1}
/<\/envelope>/ && p {print line}' file
with the opening tag, start accumulating the lines, if the desired line found set the flag, with the end tag if the flag is set print the record.
with gawk
you can do this instead
$ awk -F'\n' -v RS='</envelope>\n' \
'$3~/uniqueid.*12345/ && $4~/status.*Activated/{print $0, RT}' file
there will be an extra newline though.
Upvotes: 1