Learner
Learner

Reputation: 1

Scrapy xpath selector doesn't retrieve the element

From this url: https://www.basketball-reference.com/boxscores/202110190LAL.html, I want to extract the text from this xpath:

//div[@id='div_four_factors']/table/tbody/tr[1]/td[1]

But, the element I got is None. In Scrapy shell I use this:

>>> text = response.xpath("//div[@id='div_four_factors']/table/tbody/tr[1]/td[1]/text()").get()
>>> print(text)
>>> None

I have tried to write the right xpath for the element I want to retrieve but get none result.

Upvotes: 0

Views: 54

Answers (1)

Alexander
Alexander

Reputation: 17355

It is because that table, and it looks like all the tables from that page are loaded using javascript after the page has already loaded. So the xpath path doesn't exist in the response html you are parsing.

You can see this if you open the page in a webbrowser and right click and select "open page source" or something like that. Alternatively you could just print(response.text) but it won't be formatted and will be difficult to read.

However it does look like a copy of the tables html is commented out adjacent to where it is located when rendered. Which means you can do this:

In [1]: import re

In [2]: pat = re.compile(r'<!--(.*?)-->', flags=re.DOTALL)

In [3]: text = response.xpath("//div[@id='all_four_factors']//comment()").get()

In [4]: selector = scrapy.Selector(text=pat.findall(text)[0])

In [5]: result = selector.xpath('//tbody/tr[1]/td[1]')

In [6]: result
Out[6]: [<Selector xpath='//tbody/tr[1]/td[1]' data='<td class="right " data-stat="pace">1...'>]

In [7]: result[0].xpath('./text()').get()
Out[7]: '112.8'

In [8]: 

Upvotes: 2

Related Questions