Mahesh
Mahesh

Reputation: 61

Extract data from xml file using shell commands

I have an xml with below content and my question is how to extract Username, Password values from resource tag, here we need to exclude commented resource tag and fetch values from uncommented resource tag by using shell script. I tried but it was fetching values from latest tag. Can someone help me how to remove comments tags and fetch values from xml.

<?xml version='1.0' encoding='utf-8'?>
<!-- The contents of this file will be loaded for each web application -->
<!--
 <Resource name="jdbcSource" auth="Container"
type="javax.sql.DataSource"
 username="demo"
    password="test"
        driverClassName="driverclassname"
        url="driver@host"
    maxActive="20"
    maxIdle="10"
     />

-->

<Resource auth="Container"
driverClassName="driverclassname" maxActive="100" maxIdle="30" maxWait="10000"
name="jdbcSource" password="test" type="javax.sql.DataSource"
url="driver@host"
username="demo"/>

</Context>

Upvotes: 0

Views: 10632

Answers (4)

Reino
Reino

Reputation: 3423

RobC already explained why you shouldn't use native Bash tools to parse html/xml. I'd recommend a dedicated tool like .

I've added an opening <Context>, as shown by m.nguyencntt, and saved your xml-file as so_54034541.xml.

With command substitution you could of course set the variables by calling xidel twice...

uname=$(xidel -s so_54034541.xml -e '//Resource/@username')
pword=$(xidel -s so_54034541.xml -e '//Resource/@password')

...but xidel also has its own way to export (multiple) variables:

xidel -s so_54034541.xml -e '//Resource/(uname:=@username,pword:=@password)'
uname := demo   # Internal variables for use within the extraction query itself.
pword := test

xidel -s so_54034541.xml -e '//Resource/(uname:=@username,pword:=@password)' --output-format=bash
uname='demo'   # At the moment these are just strings.
pword='test'   # Use Bash's eval built-in command to actually set/export these variables.

eval "$(xidel -s so_54034541.xml -e '//Resource/(uname:=@username,pword:=@password)' --output-format=bash)"

echo "$uname $pword"
demo test

Upvotes: 1

m.nguyencntt
m.nguyencntt

Reputation: 943

i did as below:

Created yourxmlfile.xml

<Context>
    <Resource auth="Container"
    driverClassName="driverclassname" maxActive="100" maxIdle="30" maxWait="10000"
    name="jdbcSource" password="test" type="javax.sql.DataSource"
    url="driver@host"
    username="demo"/>
</Context>

sed -n 's/.[^ ]* password="([^"])./\1/p' yourxmlfile.xml

  test

Upvotes: 0

RobC
RobC

Reputation: 24952

Firstly my answer assumes that you have actual well formed source XML. The example code you've provided isn't XML as it doesn't have an opening root element, namely <Context> - but I'll assume there is one anyway.


Bash features by themselves are not very well suited parsing XML.

This Bash FAQ states the following:

Do not attempt [to extract data from an XML file] with , , , and so on (it leads to undesired results)

If you must use a shell script then utilize an XML specific command line tool, such as XMLStarlet (there are other similar tools available). See download info here - if you don't already have XML Starlet installed.

Solution:

Using XML Starlet you can run the following commands:

uname=$(xml sel -t -v "/Context/Resource/@username" path/to/file.xml)
pword=$(xml sel -t -v "/Context/Resource/@password" path/to/file.xml)

echo "$uname $pword" # --> demo test

Explanation

  • uname=$(...)

    Here we utilize Command substitution to assign the output of the XML Startlet command to a variable named uname (i.e. the username).

  • xml sel -t -v "/Context/Resource/@username"

    This command breaks down as follows:

    • xml - invoke the XML Starlet command.
    • sel - select data or query XML document(s).
    • -t - the template option.
    • -v - print the value of XPATH expression.
    • "/Context/Resource/@username" - the expression to select the value of the username attribute of the Resource tag/element.
  • path/to/file.xml

    This part should be replaced with the real path to your .xml file.

Likewise, we utilize a similar command for obtaining the value of the password attribute, whereby we assign the output of the command to a variable named pword, and change the XPATH expression.


Edit 1: A more efficient command

As per Charles Duffy's first comment below... you can also extract both attribute values more efficiently using the following command instead:

{ IFS= read -r uname && IFS= read -r pword; } < <(xml sel -t -v "/Context/Resource/@username" -n -v "/Context/Resource/@password" path/to/file.xml)

echo "$uname $pword" # --> demo test

The main benefit here is that the source XML file is only read once.


Edit 2: Using XML Starlet to generate an XSLT template that can then be run on any system with xsltproc, including hosts that don't have XML Starlet installed:

As per Charles Duffy's second comment below...

It's also possible to utilize XML Starlet to generate an template which is derived from the XML Starlet query shown previously. The .xsl file which is generated can then be run on any system which has available (including hosts that don't have XML Starlet installed).

The following steps demonstrate how to achieve this:

  1. Firstly run the following XML Starlet command to generate the .xsl file:

    xml sel -C -t -v "/Context/Resource/@username" -n -v "/Context/Resource/@password" path/to/file.xml > path/to/resultant/my-template.xsl
    

    This command is very similar to the previously shown XML Starlet command. The notable differences are:

    • The additional -C option between sel and -t
    • The redirection operator > and a file path. This specifies the location at which to save the output, (i.e. the generated XSLT template/stylesheet).

      Note the path/to/resultant/my-template.xsl part should be changed as necessary.

    The contents of the generated XSLT stylesheet will be something like the following:

    my-template.xsl

    <?xml version="1.0"?>
    <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:exslt="http://exslt.org/common" version="1.0" extension-element-prefixes="exslt">
      <xsl:output omit-xml-declaration="yes" indent="no"/>
      <xsl:template match="/">
        <xsl:call-template name="value-of-template">
          <xsl:with-param name="select" select="/Context/Resource/@username"/>
        </xsl:call-template>
        <xsl:value-of select="'&#10;'"/>
        <xsl:call-template name="value-of-template">
          <xsl:with-param name="select" select="/Context/Resource/@password"/>
        </xsl:call-template>
      </xsl:template>
      <xsl:template name="value-of-template">
        <xsl:param name="select"/>
        <xsl:value-of select="$select"/>
        <xsl:for-each select="exslt:node-set($select)[position()&gt;1]">
          <xsl:value-of select="'&#10;'"/>
          <xsl:value-of select="."/>
        </xsl:for-each>
      </xsl:template>
    </xsl:stylesheet>
    
  2. Next, run the following command which utilizes to transform the source .xml file. This ultimately assigns the result of the transformation to the two variables, i.e. uname and pword:

    { IFS= read -r uname && IFS= read -r pword; } < <(xsltproc path/to/resultant/my-template.xsl path/to/file.xml)
    
    echo "$uname $pword" # --> demo test
    

    Note the parts reading path/to/resultant/my-template.xsl and path/to/file.xml should be changed as necessary.


Upvotes: 2

Nahuel Fouilleul
Nahuel Fouilleul

Reputation: 19305

with perl one liner

perl -n0777E '
    # remove comments
    s/<!--.*?-->//gs;

    # match username and password with lookaheads and display in custom way
    say "user:$1\tpass:$2" while /<Resource(?=[^>]*\susername="([^"]*)")(?=[^>]*\spassword="([^"]*)")[^>]*>/g
' < file.xml

Upvotes: 0

Related Questions