Reputation: 21
I have a set of 90 xml files which get refreshed 2 hourly, each contain 10 chunks with 4 items of data from 10 sensors. Using xml_grep --root "data"
I can strip out the repeated header info in each file and produce a single file with just the data I am interested in. However, I want to go one better and produce 10 files, one for each sensor but I am stuck here.
I understand that multiple --root <cond>
instructions can be used but cannot find the format to achieve this. A chunk looks like this:
<data>
<sensor>0</sensor>
<a123>123456</a123>
<a124>123457</a124>
<a125>123458</a125>
<a126>123459</a126>
</data>
The man page does not lead me to solution and I cannot find any tutorial that goes into such detail. I have no control over the production of the source files.
Upvotes: 0
Views: 89
Reputation: 2020
If you are open to using an XSLT transform, the following might work for you.
XSLT contents:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output method="html" omit-xml-declaration="yes"/>
<xsl:template match="/">
<xsl:apply-templates select="//data[sensor=$sensor_num]"/>
</xsl:template>
<xsl:template match="data">
<xsl:copy-of select="."/>
</xsl:template>
</xsl:stylesheet>
The transform relies on a parameter passed representing the sensor number, so the transform can be processed in a BASH loop. The loop uses BASH globbing for locating the source XML files.
for i in {0..4} ; do xsltproc --stringparam sensor_num "$i" m.xslt *.xml > output/sensor_$i.txt ; done
This command generates XML files in the output directory, one file per sensor number. (Note, that for testing there were 3 source XML files containing sensor data for sensor numbers 0-4)
The contents of the output/sensor_N.xml files looks like:
output/sensor_0.xml
<data>
<sensor>0</sensor>
<a123>123456</a123>
<a124>123457</a124>
<a125>123458</a125>
<a126>123459</a126>
</data>
<data>
<sensor>0</sensor>
<a123>223456</a123>
<a124>223457</a124>
<a125>223458</a125>
<a126>223459</a126>
</data>
<data>
<sensor>0</sensor>
<a123>323456</a123>
<a124>323457</a124>
<a125>323458</a125>
<a126>323459</a126>
</data>
For clarity here is the output from head -n 3 output/*.xml
:
==> output/sensor_0.xml <==
<data>
<sensor>0</sensor>
<a123>123456</a123>
==> output/sensor_1.xml <==
<data>
<sensor>1</sensor>
<a123>123456</a123>
==> output/sensor_2.xml <==
<data>
<sensor>2</sensor>
<a123>123456</a123>
==> output/sensor_3.xml <==
<data>
<sensor>3</sensor>
<a123>123456</a123>
==> output/sensor_4.xml <==
<data>
<sensor>4</sensor>
<a123>123456</a123>
Upvotes: 1