pauldx
pauldx

Reputation: 1004

Parsing XML using command line

How do I parse a XML with below contents?

<?xml version="1.0"?>
<saw:ibot xmlns:saw="com.siebel.analytics.web/report/v1" version="1" priority="normal" jobID="36                                                                        ">
  <saw:schedule timeZoneId="(GMT-05:00) Eastern Time (US &amp; Canada)" disabled="false">
    <saw:start repeatMinuteInterval="60" endTime="23:59:00" startImmediately="true"/>
    <saw:recurrence runOnce="false">
      <saw:weekly weekInterval="1" mon="true" tue="true" wed="true" thu="true" fri="true"/>
    </saw:recurrence>
  </saw:schedule>
  <saw:dataVisibility type="recipient" runAs="cgm"/>
  <saw:choose>
    <saw:when condition="true">
      <saw:deliveryContent>
        <saw:headline>
          <saw:caption>
            <saw:text>Availability Parity Alert for Next 14 Days (@{NQ_SESSION.LBL_Next_14_Arriv                                                                        al_Days})</saw:text>
          </saw:caption>
        </saw:headline>
        <saw:conditionalReport/>
      </saw:deliveryContent>
      <saw:postActions/>
    </saw:when>
...skipping...
al_Days})</saw:text>
          </saw:caption>
        </saw:headline>
        <saw:conditionalReport/>
      </saw:deliveryContent>
      <saw:postActions/>
    </saw:when>
    <saw:otherwise/>
  </saw:choose>
  <saw:deliveryDestinations>
    <saw:destination category="dashboard"/>
    <saw:destination category="activeDeliveryProfile"/>
  </saw:deliveryDestinations>
  <saw:recipients subscribers="true" customize="false" specificRecipients="false">
    <saw:subscribers>
      <saw:user name="[email protected]"/>
      <saw:user name="[email protected]"/>
      <saw:user name="[email protected]"/>
    </saw:subscribers>
  </saw:recipients>
  <saw:conditionQuery>
    <saw:reportRefNode path="/shared/Quote/Product/Alerts/Daily Availability Parity Alert - Next                                                                         14 Days - Content"/>
  </saw:conditionQuery>
</saw:ibot>

and retrieve below output?

[email protected]
[email protected]
[email protected]

Also I have 5 .xml file with different set of parsing name value. Anyway we can parse and merge them in command line and output in one file ?

I have tried sed and awk options but not helping me much to get desired output.

Upvotes: 0

Views: 667

Answers (2)

Chris Davies
Chris Davies

Reputation: 644

This command will parse your XML document and use XPath to extract the name attribute values for the element at location /saw:ibot/saw:recipients/saw:subscribers/saw:user

xmlstarlet sel -t -v '/saw:ibot/saw:recipients/saw:subscribers/saw:user/@name' </tmp/xml

Output

[email protected]
[email protected]
[email protected]

Upvotes: 4

Sobrique
Sobrique

Reputation: 53478

Use an XML Parser. Personally - like XML::Twig and perl.

#!/usr/bin/env perl

use strict;
use warnings;
use XML::Twig;

my $twig = XML::Twig->new( );
$twig->parsefile ( 'your_file.xml' );

foreach my $saw_user ( $twig->get_xpath('//saw:user') ) {
    print $saw_user ->att('name'), "\n";
}

This prints:

[email protected]
[email protected]
[email protected]

If you want a 'one liner' then instead:

perl -MXML::Twig -0777 -e 'print map { $_ -> att('name')."\n"} ( XML::Twig->parse( <> )->get_xpath('//saw:user') )' your_xml_file

Please for the sake of future maintenance programmers and sysadmins - DO NOT use regular expressions to parse XML. Why you may ask? Well, because taking your XML as an example - it can look like any of these and still be semantically identical:

(your example +

<?xml version="1.0" encoding="utf-8"?>
<saw:ibot
    jobID="36"
    priority="normal"
    version="1"
    xmlns:saw="com.siebel.analytics.web/report/v1">
  <saw:schedule
      disabled="false"
      timeZoneId="(GMT-05:00) Eastern Time (US &amp; Canada)">
    <saw:start
        endTime="23:59:00"
        repeatMinuteInterval="60"
        startImmediately="true"
    />
    <saw:recurrence runOnce="false">
      <saw:weekly
          fri="true"
          mon="true"
          thu="true"
          tue="true"
          wed="true"
          weekInterval="1"
      />
    </saw:recurrence>
  </saw:schedule>
  <saw:dataVisibility
      runAs="cgm"
      type="recipient"
  />
  <saw:choose>
    <saw:when condition="true">
      <saw:deliveryContent>
        <saw:headline>
          <saw:caption>
            <saw:text>Availability Parity Alert for Next 14 Days (@{NQ_SESSION.LBL_Next_14_Arrival_Days})</saw:text>
          </saw:caption>
        </saw:headline>
        <saw:conditionalReport/>
      </saw:deliveryContent>
      <saw:postActions/>
    </saw:when>
    <saw:otherwise/>
  </saw:choose>
  <saw:deliveryDestinations>
    <saw:destination category="dashboard" />
    <saw:destination category="activeDeliveryProfile" />
  </saw:deliveryDestinations>
  <saw:recipients
      customize="false"
      specificRecipients="false"
      subscribers="true">
    <saw:subscribers>
      <saw:user name="[email protected]" />
      <saw:user name="[email protected]" />
      <saw:user name="[email protected]" />
    </saw:subscribers>
  </saw:recipients>
  <saw:conditionQuery>
    <saw:reportRefNode path="/shared/Quote/Product/Alerts/Daily Availability Parity Alert - Next 14 Days - Content" />
  </saw:conditionQuery>
</saw:ibot>

Or like this (note tag wrapping of elements)

<?xml version="1.0" encoding="utf-8"?>
<saw:ibot jobID="36" priority="normal" version="1" xmlns:saw="com.siebel.analytics.web/report/v1">
  <saw:schedule disabled="false" timeZoneId="(GMT-05:00) Eastern Time (US &amp; Canada)">
    <saw:start endTime="23:59:00" repeatMinuteInterval="60" startImmediately="true"/>
    <saw:recurrence runOnce="false">
      <saw:weekly fri="true" mon="true" thu="true" tue="true" wed="true" weekInterval="1"/>
    </saw:recurrence>
  </saw:schedule>
  <saw:dataVisibility runAs="cgm" type="recipient"/>
  <saw:choose>
    <saw:when condition="true">
      <saw:deliveryContent>
        <saw:headline>
          <saw:caption>
            <saw:text>Availability Parity Alert for Next 14 Days (@{NQ_SESSION.LBL_Next_14_Arrival_Days})</saw:text>
          </saw:caption>
        </saw:headline>
        <saw:conditionalReport/>
      </saw:deliveryContent>
      <saw:postActions/>
    </saw:when>
    <saw:otherwise/>
  </saw:choose>
  <saw:deliveryDestinations>
    <saw:destination category="dashboard"/>
    <saw:destination category="activeDeliveryProfile"/>
  </saw:deliveryDestinations>
  <saw:recipients customize="false" specificRecipients="false" subscribers="true">
    <saw:subscribers>
      <saw:user name="[email protected]"/>
      <saw:user name="[email protected]"/>
      <saw:user name="[email protected]"/>
    </saw:subscribers>
  </saw:recipients>
  <saw:conditionQuery>
    <saw:reportRefNode path="/shared/Quote/Product/Alerts/Daily Availability Parity Alert - Next 14 Days - Content"/>
  </saw:conditionQuery>
</saw:ibot>

Or like this:

<?xml version="1.0" encoding="utf-8"?>
<saw:ibot
jobID="36"
priority="normal"
version="1"
xmlns:saw="com.siebel.analytics.web/report/v1"
><saw:schedule
disabled="false"
timeZoneId="(GMT-05:00) Eastern Time (US &amp; Canada)"
><saw:start
endTime="23:59:00"
repeatMinuteInterval="60"
startImmediately="true"
/><saw:recurrence
runOnce="false"
><saw:weekly
fri="true"
mon="true"
thu="true"
tue="true"
wed="true"
weekInterval="1"
/></saw:recurrence></saw:schedule><saw:dataVisibility
runAs="cgm"
type="recipient"
/><saw:choose
><saw:when
condition="true"
><saw:deliveryContent
><saw:headline
><saw:caption
><saw:text
>Availability Parity Alert for Next 14 Days (@{NQ_SESSION.LBL_Next_14_Arrival_Days})</saw:text></saw:caption></saw:headline><saw:conditionalReport
/></saw:deliveryContent><saw:postActions
/></saw:when><saw:otherwise
/></saw:choose><saw:deliveryDestinations
><saw:destination
category="dashboard"
/><saw:destination
category="activeDeliveryProfile"
/></saw:deliveryDestinations><saw:recipients
customize="false"
specificRecipients="false"
subscribers="true"
><saw:subscribers
><saw:user
name="[email protected]"
/><saw:user
name="[email protected]"
/><saw:user
name="[email protected]"
/></saw:subscribers></saw:recipients><saw:conditionQuery
><saw:reportRefNode
path="/shared/Quote/Product/Alerts/Daily Availability Parity Alert - Next 14 Days - Content"
/></saw:conditionQuery></saw:ibot>

Hopefully by looking at these samples, you'll see that by reformatting your XML in a PERFECTLY VALID fashion, your regex might one day break mysteriously.

Upvotes: 1

Related Questions