Reputation: 702
Here is sample file and we need to convert values into delimiter formatted file :-
test.xml
<?xml version="1.0" encoding="UTF-8" ?>
<testjar>
<testable>
<trigger>Trigger1</trigger>
<message>2012-06-14T00:03.54</message>
<sales-info>
<san-a>no</san-a>
<san-b>no</san-b>
<san-c>no</san-c>
</sales-info>
</testable>
<testable>
<trigger>Trigger2</trigger>
<message>2012-06-15T00:03.54</message>
<sales-info>
<san-a>yes</san-a>
<san-b>yes</san-b>
<san-c>no</san-c>
</sales-info>
</testable>
</testjar>
Each record should start on new line. Sample result set should be something like this sample.txt
Trigger1|2012-06-14T00:03.54|no|no|no
Trigger2|2012-06-15T00:03.54|yes|yes|no
Note :- xmlstarlet is not installed on my server, is it possible to perform this without xmlstarlet?
Upvotes: 0
Views: 5218
Reputation: 86944
If you have xmlstarlet installed, you can try:
me@home$ xmlstarlet sel -t -m "//testable" -v trigger -o "|" -v message -o "|" -m sales-info -v san-a -o "|" -v san-b -o "|" -v san-c -n test.xml
Trigger1|2012-06-14T00:03.54|no|no|no
Trigger2|2012-06-15T00:03.54|yes|yes|no
Breakdown of the command:
xmlstarlet sel -t
-m "//testable" # match <testable>
-v trigger -o "|" # print out value of <trigger> followed by |
-v message -o "|" # print out value of <message> followed by |
-m sales-info # match <sales-info>
-v san-a -o "|" # print out value of <san-a> followed by |
-v san-b -o "|" # print out value of <san-b> followed by |
-v san-c # print out value of <san-c>
-n # print new line
test.xml # INPUT XML FILE
To target tags that varies within <testable>
, you can try the following which returns the text of all leaf nodes:
ma@home$ xmlstarlet sel -t -m "//testable" -m "descendant::*[not(*)]" -v 'text()' -i 'not(position()=last())' -o '|' -b -b -n test.xml
Trigger1|2012-06-14T00:03.54|no|no|no
Trigger2|2012-06-15T00:03.54|yes|yes|no
Beakdown of the command:
xmlstarlet sel -t
-m "//testable" # match <testable>
-m "descendant::*[not(*)]" # match all leaf nodes
-v 'text()' # print text
-i 'not(position()=last())' -o '|' # print | if not last item
-b -b # break out of nested matches
-n # print new line
test.xml # INPUT XML FILE
If you do not have access to xmlstarlet
, then do look up what other tools you have at your disposal. Other options would include xsltproc (see mzjn's answer) and xpath.
If those tools are not available, I would suggest using a higher level language (Python, Perl) which gives you access to a proper XML library.
While it is possible to parse it manually using regex
, such a solution would not be ideal† especially with inconsistent inputs. For example, the following (assuming you have gawk
and sed
) takes your input and should spits out the expected output:
me@home$ gawk 'match($0, />(.*)</, a){printf("%s|",a[1])} /<\/testable>/{print ""}' test.xml | sed 's/.$//'
Trigger1|2012-06-14T00:03.54|no|no|no
Trigger2|2012-06-15T00:03.54|yes|yes|no
However, this would fail miserably if the input format changes and is therefore not a solution I would generally recommend.
Upvotes: 1
Reputation: 38255
Here's a pure bash solution:
egrep '<trigger>|<message>|<san-.>' test.xml | sed -e 's/<[^>]*>//g' | while read line; do [ $((++i % 5)) -ne 0 ] && echo -n "$line|" || echo $line ; done
However, it only works on a file formatted as in your sample (each element in a separate row), it's not even closely as flexible / reliable as the other answers involving proper XML parsing / transforming.
It can be enhanced to some extent though...
Upvotes: 1
Reputation: 51032
Here is an XSLT stylesheet that does what you want (saved in test.xsl):
<?xml version='1.0'?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0">
<xsl:output method="text"/>
<xsl:strip-space elements="*"/>
<xsl:template match="testable">
<xsl:value-of select='trigger'/><xsl:text>|</xsl:text>
<xsl:value-of select='message'/><xsl:text>|</xsl:text>
<xsl:value-of select='sales-info/san-a'/><xsl:text>|</xsl:text>
<xsl:value-of select='sales-info/san-b'/><xsl:text>|</xsl:text>
<xsl:value-of select='sales-info/san-c'/><xsl:text>
</xsl:text>
</xsl:template>
</xsl:stylesheet>
Command (here I am assuming that you have libxml2 and libxslt installed; xsltproc is a command line tool that uses these libraries):
xsltproc -o sample.txt test.xsl test.xml
Contents of sample.txt:
Trigger1|2012-06-14T00:03.54|no|no|no
Trigger2|2012-06-15T00:03.54|yes|yes|no
Upvotes: 1