Reputation: 9
I am on the website
http://www.baseball-reference.com/players/event_hr.cgi?id=bondsba01&t=b
and trying to scrape the data from the tables. When I pull the xpath from one entry, say the pitcher "Terry Mulholland," I retrieve this:
pitchers = site.xpath("/html/body/div[2]/div[2]/div[6]/table/tbody/tr/td[3]/table/tbody/tr[2]/td/a)
When I try to print pitcher[0].text
for pitcher in printers, I get []
rather than the text
, Any idea why?
Upvotes: 0
Views: 291
Reputation: 36715
The problem is, last tbody
doesn't exist in the original source. If you get that xpath via some browser, keep in mind that browsers can guess and add missing elements to make html valid.
Removing the last tbody
resolves the problem.
In : import lxml.html as html
In : site = html.parse("http://www.baseball-reference.com/players/event_hr.cgi?id=bondsba01&t=b")
In : pitchers = site.xpath("/html/body/div[2]/div[2]/div[6]/table/tbody/tr/td[3]/table/tr[2]/td/a")
In : pitchers[0].text
Out: 'Terry Mulholland'
But I need to add that, the xpath expression you are using is pretty fragile. One div
added in some convenient place and now you have a broken script. If possible, try to find better references like id
or class
that points to your expected location.
Upvotes: 1