Shuvayan Das
Shuvayan Das

Reputation: 1048

How to extract items inside a table using scrapy

I want to extract all the functions listed inside the table in the below link : python functions list

I have tried using the chrome developers console to get the exact xpath to be used in the file spider.py as below:

$x('//*[@id="built-in-functions"]/table[1]/tbody//a/@href')

but this returns a list of all href's ( which I think what the xpath expression refers to). enter image description here

I need to extract the text from here I believe but appending /text() to the above xpath return nothing. Can someone please help me to extract the function names from the table.

Upvotes: 0

Views: 294

Answers (2)

Wilfredo
Wilfredo

Reputation: 1548

I think this should do the trick

response.css('.docutils .reference .pre::text').extract()

a non-exact xpath equivalent of it (but that also works in this case) would be:

response.xpath('//table[contains(@class, "docutils")]//*[contains(@class, "reference")]//*[contains(@class, "pre")]/text()').extract()

Upvotes: 1

Umair Ayub
Umair Ayub

Reputation: 21351

Try this:

for td in response.css("#built-in-functions > table:nth-child(4) td"):
    td.css("span.pre::text").extract_first()

Upvotes: 0

Related Questions