user4421459
user4421459

Reputation:

iterate over xml with xmlstarlet and output parent and child node values

I have this format in multiple XML files:

<bad>
 <objdesc>
 <desc id="butwba10.1.wc.01" dbi="BUTWBA10.1.1.WC">
        <physdesc>adfa;sdfkjad</physdesc>
        <related objectid="bb435.1.comdes.02"/>
        <related objectid="but614r.1.penc.01"/>
        <related objectid="but611.1.wc.01"/>
        <related objectid="but612.1.wd.01"/>
        <related objectid="bb515.1.comb.12"/>
 </desc>
 <desc id="butwba10.1.wc.02" dbi="BUTWBA10.1.2.WC">
        <physdesc>alkdjfa;sfjsdf</physdesc>
        <related objectid="but621r.1.penc.01"/>
        <related objectid="bb435.1.comdes.03"/>
 </desc>
 </objdesc>
 </bad>

I want output that looks like this:

butwba10.1.wc.01 dbi="BUTWBA10.1.1.WC" related="bb435.1.comdes.02, but614r.1.penc.01, but611.1.wc.01, but612.1.wd.01, bb515.1.comb.12"

butwba10.1.wc.02 dbi="BUTWBA10.1.2.WC" related="but621r.1.penc.01, bb435.1.comdes.03"  

I have a bash script that uses xmlstarlet to iterate over the xml files in a directory, but it dumps all the "related values" after the last desc id. It needs to associate each desc id with each set of "related" values. And it needs to include the dbi value for each id.

#!/bin/bash

for x in *.xml
do
    id=$(xml sel -t -v '//bad/objdesc/desc/@id' "$x")
    arr=( $(xml sel -t -v '//bad/objdesc/desc/related/@objectid' "$x") )
    cat<<EOF >> new_file
$id related="$(perl -e 'print join ",", @ARGV' "${arr[@]}")"
EOF
done

Upvotes: 1

Views: 2403

Answers (3)

focog77269
focog77269

Reputation: 154

$ xml sel -t -m bad/objdesc/desc -v "concat(@id,' dbi=',@dbi,' ')" -m related -v @objectid -i "number(count(./preceding-sibling::related))+1<number(count(./../related))" -o ", " --else -n -b file.xml

butwba10.1.wc.01 dbi=BUTWBA10.1.1.WC bb435.1.comdes.02, but614r.1.penc.01, but611.1.wc.01, but612.1.wd.01, bb515.1.comb.12
butwba10.1.wc.02 dbi=BUTWBA10.1.2.WC but621r.1.penc.01, bb435.1.comdes.03

Upvotes: 0

glenn jackman
glenn jackman

Reputation: 246744

Agree with sputnick that XSLT is the right tool. Nevertheless, a perl answer using an XML token parser. Has the advantage that it only has to process the file once instead of repeatedly invoking xmlstarlet:

#!perl

use strict;
use warnings;
use XML::Parser;

my (@related, @desc); # boo, global variables

sub start {
    my ($x, $elem, %attrs) = @_;
    if ($elem eq "desc") {
        @desc = @attrs{'id', 'dbi'};
        @related = ();
    }
    elsif ($elem eq "related") {
        push @related, $attrs{objectid};
    }
}

sub end {
    my ($x, $elem) = @_;
    if ($elem eq "desc") {
        printf qq{%s dbi="%s" related="%s"\n}, @desc, join(', ', @related);
    }
}

my $parser = XML::Parser->new( Handlers => {Start => \&start, End => \&end} );
$parser->parsefile($ARGV[0]);

in action:

$ perl parse.pl file 
butwba10.1.wc.01 dbi="BUTWBA10.1.1.WC" related="bb435.1.comdes.02, but614r.1.penc.01, but611.1.wc.01, but612.1.wd.01, bb515.1.comb.12"
butwba10.1.wc.02 dbi="BUTWBA10.1.2.WC" related="but621r.1.penc.01, bb435.1.comdes.03"

Upvotes: 1

Gilles Qu&#233;not
Gilles Qu&#233;not

Reputation: 184955

#!/bin/bash

for x in *.xml; do
  count=$(xml sel -t -v 'count(//bad/objdesc/desc/@id)' "$x")
  for ((i=1; i<=count; i++)); do
    id=$(xml sel -t -v "//bad/objdesc/desc[$i]/@id" "$x")
    arr=( $(xml sel -t -v "//bad/objdesc/desc[$i]/related/@objectid" "$x") )
    cat<<EOF
$id related="$(perl -e 'print join ",", @ARGV' "${arr[@]}")"
EOF
  done
done

=)

It seems like this is a job for XSLT. But, OK, shell can handle this too...

Can you do the rest for dbi ? It's better to try understanding what involves here than just cut/paste.

Upvotes: 1

Related Questions