Reputation: 101
In the below xml using perl or python(which ever is fastest) I want a way to get all nodes/node names that have attribute1 set to "characters" and attribute2 not set to "chr" or dont have attribute2 itself . Please keep in mind that my xml can have 500 nodes,so kindly suggest a faster way to get all nodes
<NODE attribute1="characters" attribute2="chr" name="node1">
<content>
value1
</content>
</NODE>
<NODE attribute1="camera" name="node2">
<content>
value2
</content>
</NODE>
<NODE attribute1="camera" attribute2="car" name="node3">
<content>
value2
</content>
</NODE>
Upvotes: 1
Views: 131
Reputation: 53498
As you've tagged this as perl/python, I shall offer a perlish approach.
Perl has a nice library called XML::Twig
which I really like for parsing XML.
#!/usr/bin/perl
use strict;
use warnings;
use XML::Twig;
my $parser = XML::Twig->new();
#would probably use parsefile instead.
#e.g.:
# my $parser = XML::Twig -> new -> parsefile ( 'your_file.xml' );
{
local $/;
$parser->parse(<DATA>);
}
#iterate all the elements in the file.
foreach my $element ( $parser->root()->children() ) {
#test your conditions
if ($element->att('attribute1') eq 'characters'
and ( not defined $element->att('attribute2')
or $element->att('attribute2') eq 'chr' )
)
{
#extract name if condition matches
print $element ->att('name'), "\n";
}
}
__DATA__
<DATA>
<NODE attribute1="characters" attribute2="chr" name="node1">
<content>
value1
</content>
</NODE>
<NODE attribute1="camera" name="node2">
<content>
value2
</content>
</NODE>
<NODE attribute1="camera" attribute2="car" name="node3">
<content>
value2
</content>
</NODE>
</DATA>
Upvotes: 1
Reputation: 195179
what you are looking for is a xpath expression:
//NODE[@attribute1="characters" and ( not(@attribute2) or @attribute2="chr")]
quick test with xmllint:
kent$ cat f.xml
<root>
<NODE attribute1="characters" attribute2="chr" name="node1">
<content>
value1
</content>
</NODE>
<NODE attribute1="camera" name="node2">
<content>
value2
</content>
</NODE>
<NODE attribute1="camera" attribute2="car" name="node3">
<content>
value2
</content>
</NODE>
</root>
kent$ xmllint --xpath '//NODE[@attribute1="characters" and ( not(@attribute2) or @attribute2="chr")]' f.xml
<NODE attribute1="characters" attribute2="chr" name="node1">
<content>
value1
</content>
</NODE>
if you only want to extract the value of attribute name
, you can use this xpath:
//NODE[@attribute1="characters" and ( not(@attribute2) or @attribute2="chr")]/@name
or string(//NODE[@attribute1="characters" and ( not(@attribute2) or @attribute2="chr")]/@name)
still test with xmllint:
kent$ xmllint --xpath '//NODE[@attribute1="characters" and ( not(@attribute2) or @attribute2="chr")]/@name' f.xml
name="node1"
kent$ xmllint --xpath 'string(//NODE[@attribute1="characters" and ( not(@attribute2) or @attribute2="chr")]/@name)' f.xml
node1
Upvotes: 1
Reputation: 10223
use lxml
module.
content = """
<body>
<NODE attribute1="characters" attribute2="chr" name="node1">
<content>
value1
</content>
</NODE>
<NODE attribute1="camera" name="node2">
<content>
value2
</content>
</NODE>
<NODE attribute1="camera" attribute2="car" name="node3">
<content>
value2
</content>
</NODE>
<NODE attribute1="characters" attribute2="car" name="node3">
<content>
value2
</content>
</NODE>
<NODE attribute1="characters" name="node3">
<content>
value2
</content>
</NODE>
</body>
"""
from lxml import etree
root = etree.fromstring(content)
l = root.xpath('//*[@attribute1="characters" and ( not(@attribute2) or @attribute2!="chr") ]')
for i in l:
print i.tag, i.attrib
output:
$ python test.py
NODE {'attribute2': 'car', 'attribute1': 'characters', 'name': 'node3'}
NODE {'attribute1': 'characters', 'name': 'node3'}
Upvotes: 0