aye decoder
aye decoder

Reputation: 127

XPath to return part of a string

hope someone can advise me here with my XPath query. I want to extract and display part of a string but the result that I am getting at the moment is returning the full string. I want to be able to get two results like the ones below.

<airline>
    <flight-number flight_id="flt-888712-departure-date-arrival-arrival-date-0101">01</flight-number>
    <flight-number flight_id="flt-888712-departure-date-arrival-arrival-date-0102">02</flight-number>
</airline>

This is the xml file flights.xml that I am working with.

<airline>
    <flight-number flight_id="flt-888712-departure-date-arrival-date-0102">01–02</flight-number>
</airline>

For example, I just get 01-02 as my result when I try this XPath query below but more needs to be done to get what I stated above. I want the strings for 01 and 02 to be returned separately. /airline/flight-number/child::text()

Can someone advise me on how to achieve the results with XPath for what I am trying to do?

Upvotes: 0

Views: 112

Answers (2)

Thor
Thor

Reputation: 47089

Using xmlstarlet and bash:

alias xml=xmlstarlet
f1=$( xml sel -t -m 'airline/flight-number' -v 'substring-before(., "-")' infile.xml)
f2=$( xml sel -t -m 'airline/flight-number' -v 'substring-after (., "-")' infile.xml)
fid=$(xml sel -t -m 'airline/flight-number/@flight_id'               \
                 -v 'substring(., 1, string-length(.)-2)' infile.xml)

xml ed -d airline/flight-number infile.xml                           |
xml ed -s airline -type elem -n flight-number -v $f1                 |
xml ed -s airline -type elem -n flight-number -v $f2                 |
xml ed -a 'airline/flight-number[1]' -t attr -n flight_id -v $fid$f1 |
xml ed -a 'airline/flight-number[2]' -t attr -n flight_id -v $fid$f2

Output:

<?xml version="1.0"?>
<airline>
  <flight-number flight_id="flt-888712-departure-date-arrival-date-0101">01</flight-number>
  <flight-number flight_id="flt-888712-departure-date-arrival-date-0102">02</flight-number>
</airline>

Upvotes: 1

urznow
urznow

Reputation: 1801

I want to be able to get two results like the ones below

With , for example:

# shellcheck shell=sh disable=SC2016
xmlstarlet edit --omit-decl \
  --var fn '//flight-number[@flight_id="flt-888712-departure-date-arrival-date-0102"]' \
  -a '$fn' -t elem -n 'flight-number' \
  -u '$prev' -x '$fn/node() | $fn/@*' \
  -u '$prev/text()' -x 'substring-after(.,"–")' \
  -u '$fn/text()' -x 'substring-before(.,"–")' \
file.xml
  • --var keeps the relevant node in a variable named fn
  • -a … appends an empty same-named sibling node
  • the $prev (aka $xstar:prev) variable refers to the node created by the most recent -i (--insert), -a (--append), or -s (--subnode) option; examples of $prev are given in doc/xmlstarlet.txt
  • 1st -u … makes the new sibling a duplicate of $fn, copying its child and attribute nodes
  • 2nd -u … updates the text of the new sibling node
  • 3rd -u … updates the text of the original node
  • the XPath 1.0 string functions are documented here, the user's guide for xmlstarlet edit is here

Output:

<airline>
  <flight-number flight_id="flt-888712-departure-date-arrival-date-0102">01</flight-number>
  <flight-number flight_id="flt-888712-departure-date-arrival-date-0102">02</flight-number>
</airline>

UPDATE 2022-09-25

If input contains a series of flights then each can get a sibling node like this,

xmlstarlet edit --omit-decl \
  --var fln '//flight-number' \
  -a '$fln' -t elem -n 'flight-number' \
  --var sib '$fln/following-sibling::*[position() mod 2 = 1]' \
  -u '$sib' -x 'preceding-sibling::*[1]/node() | preceding-sibling::*[1]/@*' \
  -u '$fln/text()' -x 'substring-before(.,"-")' \
  -u '$sib/text()' -x 'substring-after(.,"-")' \
file.xml

where variable sib references the siblings that are interleaved.

Sample output:

<airline>
  <flight-number flight_id="flt-888712-0102">01</flight-number>
  <flight-number flight_id="flt-888712-0102">02</flight-number>
  <flight-number flight_id="flt-123456-0910">09</flight-number>
  <flight-number flight_id="flt-123456-0910">10</flight-number>
  <flight-number flight_id="flt-789012-3031">30</flight-number>
  <flight-number flight_id="flt-789012-3031">31</flight-number>
</airline>

(end update)

Upvotes: 2

Related Questions