Reputation:
I have this format in multiple XML files:
<bad>
<objdesc>
<desc id="butwba10.1.wc.01" dbi="BUTWBA10.1.1.WC">
<physdesc>adfa;sdfkjad</physdesc>
<related objectid="bb435.1.comdes.02"/>
<related objectid="but614r.1.penc.01"/>
<related objectid="but611.1.wc.01"/>
<related objectid="but612.1.wd.01"/>
<related objectid="bb515.1.comb.12"/>
</desc>
<desc id="butwba10.1.wc.02" dbi="BUTWBA10.1.2.WC">
<physdesc>alkdjfa;sfjsdf</physdesc>
<related objectid="but621r.1.penc.01"/>
<related objectid="bb435.1.comdes.03"/>
</desc>
</objdesc>
</bad>
I want output that looks like this:
butwba10.1.wc.01 dbi="BUTWBA10.1.1.WC" related="bb435.1.comdes.02, but614r.1.penc.01, but611.1.wc.01, but612.1.wd.01, bb515.1.comb.12"
butwba10.1.wc.02 dbi="BUTWBA10.1.2.WC" related="but621r.1.penc.01, bb435.1.comdes.03"
I have a bash script that uses xmlstarlet to iterate over the xml files in a directory, but it dumps all the "related values" after the last desc id. It needs to associate each desc id with each set of "related" values. And it needs to include the dbi value for each id.
#!/bin/bash
for x in *.xml
do
id=$(xml sel -t -v '//bad/objdesc/desc/@id' "$x")
arr=( $(xml sel -t -v '//bad/objdesc/desc/related/@objectid' "$x") )
cat<<EOF >> new_file
$id related="$(perl -e 'print join ",", @ARGV' "${arr[@]}")"
EOF
done
Upvotes: 1
Views: 2403
Reputation: 154
$ xml sel -t -m bad/objdesc/desc -v "concat(@id,' dbi=',@dbi,' ')" -m related -v @objectid -i "number(count(./preceding-sibling::related))+1<number(count(./../related))" -o ", " --else -n -b file.xml
butwba10.1.wc.01 dbi=BUTWBA10.1.1.WC bb435.1.comdes.02, but614r.1.penc.01, but611.1.wc.01, but612.1.wd.01, bb515.1.comb.12
butwba10.1.wc.02 dbi=BUTWBA10.1.2.WC but621r.1.penc.01, bb435.1.comdes.03
Upvotes: 0
Reputation: 246744
Agree with sputnick that XSLT is the right tool. Nevertheless, a perl answer using an XML token parser. Has the advantage that it only has to process the file once instead of repeatedly invoking xmlstarlet:
#!perl
use strict;
use warnings;
use XML::Parser;
my (@related, @desc); # boo, global variables
sub start {
my ($x, $elem, %attrs) = @_;
if ($elem eq "desc") {
@desc = @attrs{'id', 'dbi'};
@related = ();
}
elsif ($elem eq "related") {
push @related, $attrs{objectid};
}
}
sub end {
my ($x, $elem) = @_;
if ($elem eq "desc") {
printf qq{%s dbi="%s" related="%s"\n}, @desc, join(', ', @related);
}
}
my $parser = XML::Parser->new( Handlers => {Start => \&start, End => \&end} );
$parser->parsefile($ARGV[0]);
in action:
$ perl parse.pl file
butwba10.1.wc.01 dbi="BUTWBA10.1.1.WC" related="bb435.1.comdes.02, but614r.1.penc.01, but611.1.wc.01, but612.1.wd.01, bb515.1.comb.12"
butwba10.1.wc.02 dbi="BUTWBA10.1.2.WC" related="but621r.1.penc.01, bb435.1.comdes.03"
Upvotes: 1
Reputation: 184955
#!/bin/bash
for x in *.xml; do
count=$(xml sel -t -v 'count(//bad/objdesc/desc/@id)' "$x")
for ((i=1; i<=count; i++)); do
id=$(xml sel -t -v "//bad/objdesc/desc[$i]/@id" "$x")
arr=( $(xml sel -t -v "//bad/objdesc/desc[$i]/related/@objectid" "$x") )
cat<<EOF
$id related="$(perl -e 'print join ",", @ARGV' "${arr[@]}")"
EOF
done
done
=)
It seems like this is a job for XSLT. But, OK, shell can handle this too...
Can you do the rest for dbi
? It's better to try understanding what involves here than just cut/paste.
Upvotes: 1