Ram
Ram

Reputation: 173

bash: updating an XML document when item to be modified is below grepped element

I have following XML (captured in a variable from curl output) that I need to modify in bash

<hudson.model.StringParameterDefinition>
    <name>SPRINT</name>
    <description></description>
    <defaultValue>10</defaultValue>
</hudson.model.StringParameterDefinition>

I need to look for name tag with value SPRINT and increment the subsequent defaultValue so that XML becomes:

<hudson.model.StringParameterDefinition>
    <name>SPRINT</name>
    <description></description>
    <defaultValue>11</defaultValue>
</hudson.model.StringParameterDefinition>

Quick question, how do I do that?

Details on what I've done

I am kind-of able to achieve this (thanks to SO) using following commands:

config=$(curl "$JENKINS_URL/job/$JOB_NAME/config.xml")
LAST_SPRINT=$(echo "$config" |  sed -n '/<name>SPRINT<\/name>/{n;n;p}' | sed -n -e 's/.*<defaultValue>\(.*\)<\/defaultValue>.*/\1/p')
NEW_SPRINT=$((LAST_SPRINT+1))
updated_config=$(echo "$config" | sed -e "s/<defaultValue>$LAST_SPRINT<\/defaultValue>/<defaultValue>$NEW_SPRINT<\/defaultValue>/")

This is not efficient, and potentially incorrect, because:

As a side note, the name tag containing value SPRINT is guaranteed to occur only once in entire XML. And yes I know bash/sed may not be the best way to do this, but I am limited to packages/tools present by default on SUSE Linux Enterprise Server 11.

Upvotes: 0

Views: 87

Answers (4)

Parfait
Parfait

Reputation: 107567

Consider XSLT, the special purpose language designed specifically to transform XML files to customized structures. Bash can run such transformations with xsltproc:

XML (assumed structure)

<root>
    <hudson.model.StringParameterDefinition>
        <name>SPRINT</name>
        <description></description>
        <defaultValue>10</defaultValue>
    </hudson.model.StringParameterDefinition>
</root>

XSLT (save as .xsl)

<xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output version="1.0" encoding="UTF-8" indent="yes" />
<xsl:strip-space elements="*"/>

  <xsl:template match="root">
    <xsl:copy>
      <xsl:apply-templates select="hudson.model.StringParameterDefinition[name='SPRINT']"/>
    </xsl:copy>    
  </xsl:template>

  <xsl:template match="hudson.model.StringParameterDefinition[name='SPRINT']">
    <xsl:copy>
      <xsl:copy-of select="name|description"/>
      <defaultValue><xsl:value-of select="number(defaultValue) + 1"/></defaultValue>
    </xsl:copy>
  </xsl:template>

</xsl:transform>

Bash line

shell> xsltproc transform.xsl input.xml > output.xml

Output

<root>
  <hudson.model.StringParameterDefinition>
    <name>SPRINT</name>
    <description/>
    <defaultValue>11</defaultValue>
  </hudson.model.StringParameterDefinition>
</root>

Upvotes: 0

Charles Duffy
Charles Duffy

Reputation: 295291

The right tool for this job is not grep, but an XML-aware tool such as XMLStarlet.

To do the increment-by-one you requested natively, that would look like:

xmlstarlet ed \
  -u '//hudson.model.StringParameterDefinition[./name="SPRINT"]/defaultValue' \
  -x '. + 1' <in.xml >out.xml

To assert a value known to your outer shell script, by contrast, would instead be:

newValue=20

xmlstarlet ed \
  -u '//hudson.model.StringParameterDefinition[./name="SPRINT"]/defaultValue' \
  -v "$newValue" <in.xml >out.xml

The above were tested against the following document, which just wraps a root element around what you already had:

<root>
  <hudson.model.StringParameterDefinition>
    <name>SPRINT</name>
    <description/>
    <defaultValue>10</defaultValue>
  </hudson.model.StringParameterDefinition>
</root>

If it doesn't work against your actual document, the most likely cause is that these elements are not in the default namespace due to a xmlns= declaration at a higher level in the document; when asking questions that involve XML documents, such declarations need to be included to allow for fully responsive answers.


Also, a Python answer using only modules built into the 2.7-series interpreter (since you specified that you can't install 3rd-party tools):

#!/usr/bin/env python
import sys, xml.etree.ElementTree as ET
doc = ET.fromstring(sys.stdin.read())
for node in doc.findall('.//hudson.model.StringParameterDefinition'):
    name_el = node.find('./name')
    if name_el is not None and name_el.text == 'SPRINT':
        default_el = node.find('./defaultValue')
        if default_el is None: continue
        default_el.text = str(int(default_el.text) + 1)
print ET.tostring(doc)

...pipe your content through that script, and the defaultValue for the SPRINT parameter will be incremented.

To wrap this in a shell function would look something like the following:

# assign python code to a shell variable
_increment_sprint_script=$(cat <<'EOF'
import sys, xml.etree.ElementTree as ET
doc = ET.fromstring(sys.stdin.read())
for node in doc.findall('.//hudson.model.StringParameterDefinition'):
    name_el = node.find('./name')
    if name_el is not None and name_el.text == 'SPRINT':
        default_el = node.find('./defaultValue')
        if default_el is None: continue
        default_el.text = str(int(default_el.text) + 1)
print ET.tostring(doc)
EOF
)

# define a function that calls the interpreter with that code
increment_sprint() { python -c "$_increment_sprint_script" "$@"; }

# ...then you can just pipe through it.
updated_config=$(curl "$JENKINS_URL/job/$JOB_NAME/config.xml" | increment_sprint)  

Upvotes: 3

JNevill
JNevill

Reputation: 50019

Lord help me this is the ugliest awk one liner I've ever built, but it will do the job if no one else offers up anything prettier in sed; or awk, for that matter.

I believe you could get something much cleaner with some regex, but I was worried it might be more error prone with oddball edge cases and the like.

Furthermore, sed and awk and grep are just not great tools for this, as you mentioned. Literally any other tool that can handle XML gracefully would be preferred over this.

Anyway, here's my abomination:

awk -F"[<>]" 'foundSprint==1 && $2=="defaultValue" {$3=$3+1; print $1"<"$2">"$3"<"$4">";alreadyPrinted=1;foundSprint=0;} $2=="name" && $3=="SPRINT" {foundSprint=1} alreadyPrinted!=1{print $0; alreadyPrinted=0}' infile > outfile

There's three blocks to this script.

  1. If we have foundSprint and the tag is defaultValue then increment the value, which will be in field 3 $3 when we split the line by "<" or ">". Then increment the value, and print the line out replacing the appropriate "<", ">" that were stripped because they are treated as delims by awk. Lastly, set the alreadyPrinted variable to 1 so we don't reprint this line in step 3.
  2. Here we look for Sprint in tag name. If we find it then we set that foundSprint variable to 1 that we used in step 1.
  3. Finally, if we didn't already print the line in step 1, then print it now and reset that alreadyPrinted variable to something other than 1.

Upvotes: 1

Thijs Dalhuijsen
Thijs Dalhuijsen

Reputation: 778

Since you have "BASH" in the title, have a look at this (pure) bash implementation :) No other binaries needed, and it does what you want.

inputfile contains your given example text

Enjoy, --bvk

#!/bin/bash
FILEN=inputfile.txt

incrementit() {
        local IFS="\n"
        REPLACING=0
        while read line; do
                if [[ "$line" == *"<name>SPRINT</name>"* ]] && [[ $REPLACING == 0 ]]; then
                        REPLACING=1
                fi
                if [[ "$line" == *"<defaultValue>"*"</defaultValue>"* ]] && [[ $REPLACING == 1 ]]; then
                        CURVER="${line//<defaultValue>/}"
                        CURVER="${CURVER//<\/defaultValue>/}"
                        echo -e "    <defaultValue>"$(( CURVER + 1 ))"</defaultValue>"
                        REPLACING=0
                else
                        echo "$line"
                fi
        done
}

incrementit <$FILEN

Upvotes: 0

Related Questions