Reputation: 19206
I tried to scrape a page. Sorry, I can't disclose the link because of my job's non-disclosure agreement.
print response.xpath('//tr')
But it's weird, the XPATH only works on Chrome Dev Tools, but not on Scrapy. I checked the scraped HTML via response.body
, and the HTML is normal.
Upvotes: 0
Views: 884
Reputation: 19206
Found the answer. It turns out the HTML is broken and Scrapy can't fix it on its own, so it needs Beautiful Soup help. I do it like this:
from scrapy.selector import Selector
from bs4 import BeautifulSoup
fixed_html = str(BeautifulSoup(response.body, "lxml"))
print Selector(text=fixed_html).xpath('//*')
Upvotes: 4