user1082471
user1082471

Reputation: 9

trouble getting text from xpath entry in python

I am on the website

http://www.baseball-reference.com/players/event_hr.cgi?id=bondsba01&t=b

and trying to scrape the data from the tables. When I pull the xpath from one entry, say the pitcher "Terry Mulholland," I retrieve this:

pitchers = site.xpath("/html/body/div[2]/div[2]/div[6]/table/tbody/tr/td[3]/table/tbody/tr[2]/td/a)

When I try to print pitcher[0].text for pitcher in printers, I get [] rather than the text, Any idea why?

Upvotes: 0

Views: 291

Answers (1)

Avaris
Avaris

Reputation: 36715

The problem is, last tbody doesn't exist in the original source. If you get that xpath via some browser, keep in mind that browsers can guess and add missing elements to make html valid.

Removing the last tbody resolves the problem.

In : import lxml.html as html

In : site = html.parse("http://www.baseball-reference.com/players/event_hr.cgi?id=bondsba01&t=b")

In : pitchers = site.xpath("/html/body/div[2]/div[2]/div[6]/table/tbody/tr/td[3]/table/tr[2]/td/a")

In : pitchers[0].text
Out: 'Terry Mulholland'

But I need to add that, the xpath expression you are using is pretty fragile. One div added in some convenient place and now you have a broken script. If possible, try to find better references like id or class that points to your expected location.

Upvotes: 1

Related Questions