Sabreur
Sabreur

Reputation: 21

How should an XML_GREP command with multi --root options be made

I have a set of 90 xml files which get refreshed 2 hourly, each contain 10 chunks with 4 items of data from 10 sensors. Using xml_grep --root "data" I can strip out the repeated header info in each file and produce a single file with just the data I am interested in. However, I want to go one better and produce 10 files, one for each sensor but I am stuck here.

I understand that multiple --root <cond> instructions can be used but cannot find the format to achieve this. A chunk looks like this:

    <data>
        <sensor>0</sensor>
        <a123>123456</a123>
        <a124>123457</a124>
        <a125>123458</a125>
        <a126>123459</a126>
    </data>

The man page does not lead me to solution and I cannot find any tutorial that goes into such detail. I have no control over the production of the source files.

Upvotes: 0

Views: 89

Answers (1)

j_b
j_b

Reputation: 2020

If you are open to using an XSLT transform, the following might work for you.

XSLT contents:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">

  <xsl:output method="html" omit-xml-declaration="yes"/>

  <xsl:template match="/">
      <xsl:apply-templates select="//data[sensor=$sensor_num]"/>
  </xsl:template>

  <xsl:template match="data">
    <xsl:copy-of select="."/>
  </xsl:template>

</xsl:stylesheet>

The transform relies on a parameter passed representing the sensor number, so the transform can be processed in a BASH loop. The loop uses BASH globbing for locating the source XML files.

for i in {0..4} ; do xsltproc --stringparam sensor_num "$i" m.xslt *.xml > output/sensor_$i.txt ; done

This command generates XML files in the output directory, one file per sensor number. (Note, that for testing there were 3 source XML files containing sensor data for sensor numbers 0-4)

The contents of the output/sensor_N.xml files looks like:

output/sensor_0.xml 
<data>
        <sensor>0</sensor>
        <a123>123456</a123>
        <a124>123457</a124>
        <a125>123458</a125>
        <a126>123459</a126>
    </data>
<data>
        <sensor>0</sensor>
        <a123>223456</a123>
        <a124>223457</a124>
        <a125>223458</a125>
        <a126>223459</a126>
    </data>
<data>
        <sensor>0</sensor>
        <a123>323456</a123>
        <a124>323457</a124>
        <a125>323458</a125>
        <a126>323459</a126>
    </data>

For clarity here is the output from head -n 3 output/*.xml:

==> output/sensor_0.xml <==
<data>
        <sensor>0</sensor>
        <a123>123456</a123>

==> output/sensor_1.xml <==
<data>
        <sensor>1</sensor>
        <a123>123456</a123>

==> output/sensor_2.xml <==
<data>
        <sensor>2</sensor>
        <a123>123456</a123>

==> output/sensor_3.xml <==
<data>
        <sensor>3</sensor>
        <a123>123456</a123>

==> output/sensor_4.xml <==
<data>
        <sensor>4</sensor>
        <a123>123456</a123>

Upvotes: 1

Related Questions