Aminah Nuraini
Aminah Nuraini

Reputation: 19206

XPATH works in Chrome, but not in Scrapy

I tried to scrape a page. Sorry, I can't disclose the link because of my job's non-disclosure agreement.

 print response.xpath('//tr')

But it's weird, the XPATH only works on Chrome Dev Tools, but not on Scrapy. I checked the scraped HTML via response.body, and the HTML is normal.

Upvotes: 0

Views: 884

Answers (1)

Aminah Nuraini
Aminah Nuraini

Reputation: 19206

Found the answer. It turns out the HTML is broken and Scrapy can't fix it on its own, so it needs Beautiful Soup help. I do it like this:

from scrapy.selector import Selector

from bs4 import BeautifulSoup

fixed_html = str(BeautifulSoup(response.body, "lxml"))

print Selector(text=fixed_html).xpath('//*')

Upvotes: 4

Related Questions