vsraju
vsraju

Reputation: 93

how to sort xml file in UNIX

We have XML file, (Date: DD/MM/YY)

<ABT><pid>101</pid><date>10/12/13</date><name>AAA</name></ABT>
<ABT><pid>102</pid><date>11/12/13</date><name>BBB</name></ABT>
<ABT><pid>101</pid><date>09/12/13</date><name>AAA</name></ABT>
<ABT><pid>102</pid><date>24/12/13</date><name>BBB</name></ABT>
<JRE><pid>101</pid><date>01/12/13</date><name>AAA</name></JRE> 
<JRE><pid>102</pid><date>02/12/13</date><name>BBB</name></JRE>

output should be as

<JRE><pid>101</pid><date>01/12/13</date><name>AAA</name></JRE>
<ABT><pid>101</pid><date>09/12/13</date><name>AAA</name></ABT>
<ABT><pid>101</pid><date>10/12/13</date><name>AAA</name></ABT>
<JRE><pid>102</pid><date>02/12/13</date><name>BBB</name></JRE>
<ABT><pid>102</pid><date>11/12/13</date><name>BBB</name></ABT>
<ABT><pid>102</pid><date>24/12/13</date><name>BBB</name></ABT>

How to sort this file based on <pid>, <date>.

Upvotes: 4

Views: 4759

Answers (2)

user37421
user37421

Reputation: 435

sorting with xidel

This command will use xidel to sort the file named file.xml according to pid and date.

xidel --xquery 'for $i in doc("file.xml")/* order by $i/pid, $i, $i/date return $i' --output-format xml

considering the root node

you always need one root element.

Link to the topic

So, add the root node to the input:

<root>
    <ABT>
        <pid>101</pid>
        <date>10/12/13</date>
        <name>AAA</name>
    </ABT>
    <ABT>
        <pid>102</pid>
        <date>11/12/13</date>
        <name>BBB</name>
    </ABT>
    <ABT>
        <pid>101</pid>
        <date>09/12/13</date>
        <name>AAA</name>
    </ABT>
    <ABT>
        <pid>102</pid>
        <date>24/12/13</date>
        <name>BBB</name>
    </ABT>
    <JRE>
        <pid>101</pid>
        <date>01/12/13</date>
        <name>AAA</name>
    </JRE> 
    <JRE>
        <pid>102</pid>
        <date>02/12/13</date>
        <name>BBB</name>
    </JRE>
</root>

Note: anyway xidel can handle this multiple root structure.

In case that all nodes which are the direct children of the root node have <pid> and <date> as children.

And your input XML file is named file.xml.

This command will return the top nodes after sorting according to their children whose names are <pid> and <date>, using xidel.

xidel --xquery '<root>
                {for $i in doc("file.xml")/*/*
                order by $i/pid, $i, $i/date
                return $i}
                </root>' --output-format xml

command in one line:

xidel --xquery '<root>{for $i in doc("file.xml")/*/* order by $i/pid, $i, $i/date return $i}</root>' --output-format xml

Upvotes: 3

fedorqui
fedorqui

Reputation: 290515

I would use sort together with sed. If you firstly want to order based on pid and then on date, let's add a space after each one of these tags and then sort accordingly:

$ sed -e 's/<pid>/& /' -e 's/<date>/& /' file | sort -nk2 -k3 | sed 's/ //g'
<JRE><pid>101</pid><date>01/12/13</date><name>AAA</name></JRE>
<ABT><pid>101</pid><date>09/12/13</date><name>AAA</name></ABT>
<ABT><pid>101</pid><date>10/12/13</date><name>AAA</name></ABT>
<JRE><pid>102</pid><date>02/12/13</date><name>BBB</name></JRE>
<ABT><pid>102</pid><date>11/12/13</date><name>BBB</name></ABT>
<ABT><pid>102</pid><date>24/12/13</date><name>BBB</name></ABT>

First sed adds a space before the date and the last one removes it. In between, sort -n -k2 -k3 sorts numerically (-n), firstly based on column 2 (-k2) and then on column 3 (-k3).

Upvotes: 2

Related Questions