XPATH works in Chrome, but not in Scrapy

Question

I tried to scrape a page. Sorry, I can't disclose the link because of my job's non-disclosure agreement.

 print response.xpath('//tr')

But it's weird, the XPATH only works on Chrome Dev Tools, but not on Scrapy. I checked the scraped HTML via response.body, and the HTML is normal.

Aminah Nuraini · Accepted Answer

Found the answer. It turns out the HTML is broken and Scrapy can't fix it on its own, so it needs Beautiful Soup help. I do it like this:

from scrapy.selector import Selector

from bs4 import BeautifulSoup

fixed_html = str(BeautifulSoup(response.body, "lxml"))

print Selector(text=fixed_html).xpath('//*')

XPATH works in Chrome, but not in Scrapy

Answers (1)

Related Questions