Reputation: 7028
I am trying to get a list of teams and scores from this page http://stats.rleague.com/rl/seas/2014.html just as an exercise to learn.
I am not getting the expected results first my imports and page.
In [1]: from lxml import html
In [2]: import requests
In [3]: page = requests.get('http://stats.rleague.com/rl/seas/2014.html')
In [4]: tree = html.fromstring(page.text)
this is the html for the title.
<html><title>Rugby League Tables / Season 2014</title>
and for the teams
<tr><td width=20%><a href="../teams/souths/souths_idx.html">Souths</a></td><td width=12%>4t 6g </td><td width=5%> 28</td><td><b>Date:</b>Thu 06-Mar-2014 <b>Venue:</b><a href="../venues/stadium_australia.html">Stadium Australia</a> <b>Crowd:</b>27,282</td></tr>
<tr><td width=20%><a href="../teams/easts/easts_idx.html">Sydney Roosters</a></td><td width=12%>1t 2g </td><td width=5%> 8</td><td><b>Souths</b> won by <b> 20 pts</b>
However I get blank lists, what am I doing wrong?
In [6]: print(tree)
<Element html at 0x7f518067fc78>
In [7]: titles = tree.xpath('//html[@title]/text()')
In [8]: print(titles)
[]
In [11]: teams = tree.xpath('//tr/td[@href]/text()')
In [12]: print(teams)
[]
Upvotes: 0
Views: 122
Reputation: 368894
Changing XPath expressions will give you wanted results:
# `title` is not an attribute, but a tag.
titles = tree.xpath('.//title/text()')
print(titles)
# `td` does not have `href` attribute, but `a` tag.
teams = tree.xpath('//tr/td/a[@href]/text()')
print(teams)
Upvotes: 1