Jeff Erickson
Jeff Erickson

Reputation: 3893

Trying to read value at xpath

I'm trying to get the value of the School District listed on this website: http://gis.nyc.gov/dcp/at/f1.jsp?submit=true&house_nbr=310&street_name=Lenox+Avenue&boro=1

I used Firebug to get the XPath of that value: /html/body/table/tbody/tr[2]/td/table/tbody/tr/td[2]/table/tbody/tr/td/table/tbody/tr[3]/td/table/tbody/tr[2]/td[2]/table/tbody/tr[2]/td/table/tbody/tr[10]/td[2]

And would like to read it in with Perl. I wrote the following code:

#!/usr/bin/perl -w

use HTML::TreeBuilder::XPath;
use Data::Dumper;

my $tree= HTML::TreeBuilder::XPath->new;

$tree->parse_file("test.html");

my @nb=$tree->findvalue( '/html/body/table/tbody/tr[2]/td/table/tbody/tr/td[2]/table/tbody/tr/td/table/tbody/tr[3]/td/table/tbody/tr[2]/td[2]/table/tbody/tr[2]/td/table/tbody/tr[10]/td[2]');

print Dumper(@nb);

But it just returns $VAR1 = '';.

Any suggestions. To get this to run, I just copied the source from the webpage into test.html.

Thank you!

Upvotes: 1

Views: 410

Answers (1)

ikegami
ikegami

Reputation: 386371

The start tag of certain HTML elements (HTML, HEAD, BODY and TBODY) is optional. Take a look at

...<table><tr><td>Foo</td></tr></table>...

According to HTML, there are four elements represented by that snippet:

TABLE
   TBODY
      TR
         TD

Firefox creates all four elements, so it gives the following xpath for the TD element:

.../table/tbody/tr/td

HTML::TreeBuilder probably doesn't create elements when their start tags have been omitted, so it only creates three elements for that snippet:

TABLE
   TR
      TD

You'd need to use the following xpath to locate the TD element:

.../table/tr/td

I bet you'll find results if you removed the tbody tests from your xpath, as the TBODY elements are most likely not found in the file.

Upvotes: 3

Related Questions