Scrapy, python: Unable to extract data using xpath seen in firebug

Question

I am fairly new to web scraping, scrapy and python. Am trying to scrape data from this website page.

I want to extract email id given in the footer of the page: info@bikramyogasg.com and have tried using two xpaths to extract this in scrapy spider:

Relative: id("gkFooterNav")/div/p/span/a/text()
Absolute: /html/body/div[4]/div1/div/div/div/p/span/a/text()

I have tried these xpaths with and without the last component of 'text()'. None of these have worked and the spider returns a null list.

However, when I check these with xpath checker, I get the correct value. Unable to figure out what's going wrong here. Can anyone help please?

Thanks, Tuhina

GHajba · Accepted Answer

If you parse the site and look at the contents you see a message from the website:

This e-mail address is being protected from spambots. You need JavaScript enabled to view it.

So you need to execute the JavaScript to get access to the email address. Alternatively you could extract the email address from the JavaScript above this text and convert it accordingly -- without even executing any JavaScript.

Scrapy, python: Unable to extract data using xpath seen in firebug

Answers (1)

Related Questions