Mario Honse
Mario Honse

Reputation: 299

XPath from Chrome results in an empty list in scrapy

I'm inspecting a page by Chrome Dev Tools and have xpath of an element on the page. I disable javascript deliberately so DOM doesn't get changed. However, xpath I Chrome gives for the element results in [] in scrapy, although the element, of course, exists. What might be the problem?

In particular, xpath //*[@id="prddeatailed_container"]/table[1]/tbody/tr[1]/td/div/table/tbody/tr[2]/td[1]/span for this http://cheaptool.ru/product/sadovyj-pylesos-billy-goat-lb351/ - the price 29 990.

$ scrapy shell 'http://cheaptool.ru/product/sadovyj-pylesos-billy-goat-lb351'

In [2]: xp1 = '//*[@id="prddeatailed_container"]/table[1]/tbody/tr[1]/td/div/table/tbody/tr[2]/td[1]/span'

In [3]: aaa = response.xpath(xp1)

In [4]: aaa
Out[4]: []

UPDATE: It turned out in the result html there was no tbody. Why did Chrome showed it in xpath? How to make it the real html in xpath?

Upvotes: 0

Views: 1226

Answers (2)

Aminah Nuraini
Aminah Nuraini

Reputation: 19206

Since you mention tbody, a lot of HTML don't follow the rule of using tbody and usually Chrome fix it by adding tbody automatically to it. If you print the response HTML, you won't find any tbody.

Upvotes: 0

har07
har07

Reputation: 89325

"I disable javascript deliberately so DOM doesn't get changed"

Besides javascript, DOM can also get changed because browsers usually has algorithms to fix the html source so that it can be rendered reasonably well by the browser.

"@user3616725, the question is not what to use, but why doesn't it work"

Common case is as what you discovered while I'm writing this answer, Chrome added <tbody> tag automatically. See the following discussion for explanation about this behavior :

Why does my XPath query (scraping HTML tables) only work in Firebug, but not the application I'm developing?

"It turned out in the result html there was no tbody. Why did Chrome showed it in xpath? How to make it the real html in xpath?"

The html result as rendered by Chrome indeed has <tbody>, that's why Chrome showed it in xpath. Chrome dev tools works against final DOM which may be different from the actual HTML source, so you simply can't rely on xpath from Chrome for use in Scrapy.

Upvotes: 2

Related Questions