Brian
Brian

Reputation: 101

Perl libXML find node by attribute value

I have very large XML document that I am iterating through. The XML's use mostly attributes rather than node values. I may need to find numerous nodes in the file to piece together one grouping of information. They are tied together via different ref tag values. Currently each time I need to locate one of the nodes to extract data from I am looping through the entire XML and doing a match on the attribute to find the correct node. Is there a more efficient way to just select a node of a given attribute value instead of constantly looping and compare? My current code is so slow it is almost useless.

Currently I am doing something like this numerous times in the same file for numerous different nodes and attribute combinations.

my $searchID = "1234";
foreach my $nodes ($xc->findnodes('/plm:PLMXML/plm:ExternalFile')) {
    my $ID      = $nodes->findvalue('@id');
    my $File    = $nodes->findvalue('@locationRef');
    if ( $searchID eq $ID ) {
        print "The File Name = $File\n";
    }
}

In the above example I am looping and using an "if" to compare for an ID match. I was hoping I could do something like this below to just match the node by attribute instead... and would it be any more efficient then looping?

my $searchID = "1234";
$nodes = ($xc->findnodes('/plm:PLMXML/plm:ExternalFile[@id=$searchID]'));
my $File    = $nodes->findvalue('@locationRef');
print "The File Name = $File\n";

Upvotes: 4

Views: 4977

Answers (4)

Grant McLean
Grant McLean

Reputation: 6998

I think you just need to do some study on XPath expressions. For example, you could do something like this:

my $search_id = "1234";
my $query = "/plm:PLMXML/plm:ExternalFile/[\@id = '$search_id']";
foreach my $node ($xc->findnodes($query)) {
    # ...
}

In the XPath expression you can also combine multiple attribute checks, e.g.:

[@id = '$search_id' and contains(@pathname, '.pdf')]

One XPath Tutorial of many

Edit: Another useful resource is the XPath expressions page in "Perl XML::LibXML by Example". The "TRY IT!" buttons on that page link to an "XPath Sandbox" page where you can try the example and edit it. The sandbox also has a "+" button which allows you to work with your own XML document, including one with namespaces (the default example file doesn't have namespaces).

Upvotes: 3

ikegami
ikegami

Reputation: 385789

Do one pass to extract the information you need into a more convenient format or to build an index.

my %nodes_by_id;
for my $node ($xc->findnodes('//*[@id]')) {
    $nodes_by_id{ $node->getAttribute('id') } = $node;
}

Then your loops become

my $node = $nodes_by_id{'1234'};

(And stop using findvalue instead of getAttribute.)

Upvotes: 4

user52889
user52889

Reputation: 1501

If you will be doing this for lots of IDs, then ikegami's answer is worth reading.

I was hoping I could do something like this below to just match the node by attribute instead

...

$nodes = ($xc->findnodes('/plm:PLMXML/plm:ExternalFile[@id=$searchID]'));

Sort of.

For a given ID, yes, you can do

$nodes = $xc->findnodes("/plm:PLMXML/plm:ExternalFile[\@id=$searchID]");

... provided that $searchID is known to be numeric. Notice the double quotes in perl means the variables interpolate, so you should escape the @id because that is part of the literal string, not a perl array, whereas you want the value of $searchID to become part of the xpath string, so it is not escaped.

Note also that in this case you are asking for it in scalar context will have a XML::LibXML::Nodelist object, not the actual node, nor an arrayref; for the latter you will need to use square brackets instead of round ones as I have done in the next example.

Alternatively, if your search id may not be numeric but you know for sure that it is safe to be put in an XPath string (e.g. doesn't have any quotes), you can do the following:

$nodes = [ $xc->findnodes('/plm:PLMXML/plm:ExternalFile[@id="' . $searchID . '"]') ];
print $nodes->[0]->getAttribute('locationRef'); # if you're 100% sure it exists

Notice here that the resulting string will enclose the value in quotation marks.

Finally, it is possible to skip straight to:

print $xc->findvalue('/plm:PLMXML/plm:ExternalFile[@id="' . $searchID . '"]/@locationRef');

... providing you know that there is only one node with that id.

Upvotes: 2

nwellnhof
nwellnhof

Reputation: 33618

If you have a DTD for your document that declares the id attribute as DTD ID, and you make sure the DTD is read when parsing the document, you can access the elements with a certain id efficiently via $doc->getElementById($id).

Upvotes: 1

Related Questions