Reputation: 299
I'm inspecting a page by Chrome Dev Tools and have xpath of an element on the page. I disable javascript deliberately so DOM doesn't get changed. However, xpath I Chrome gives for the element results in []
in scrapy, although the element, of course, exists. What might be the problem?
In particular, xpath //*[@id="prddeatailed_container"]/table[1]/tbody/tr[1]/td/div/table/tbody/tr[2]/td[1]/span
for this http://cheaptool.ru/product/sadovyj-pylesos-billy-goat-lb351/ - the price 29 990.
$ scrapy shell 'http://cheaptool.ru/product/sadovyj-pylesos-billy-goat-lb351'
In [2]: xp1 = '//*[@id="prddeatailed_container"]/table[1]/tbody/tr[1]/td/div/table/tbody/tr[2]/td[1]/span'
In [3]: aaa = response.xpath(xp1)
In [4]: aaa
Out[4]: []
UPDATE: It turned out in the result html there was no tbody. Why did Chrome showed it in xpath? How to make it the real html in xpath?
Upvotes: 0
Views: 1226
Reputation: 19206
Since you mention tbody
, a lot of HTML don't follow the rule of using tbody
and usually Chrome fix it by adding tbody
automatically to it. If you print the response HTML, you won't find any tbody
.
Upvotes: 0
Reputation: 89325
"I disable javascript deliberately so DOM doesn't get changed"
Besides javascript, DOM can also get changed because browsers usually has algorithms to fix the html source so that it can be rendered reasonably well by the browser.
"@user3616725, the question is not what to use, but why doesn't it work"
Common case is as what you discovered while I'm writing this answer, Chrome added <tbody>
tag automatically. See the following discussion for explanation about this behavior :
"It turned out in the result html there was no tbody. Why did Chrome showed it in xpath? How to make it the real html in xpath?"
The html result as rendered by Chrome indeed has <tbody>
, that's why Chrome showed it in xpath. Chrome dev tools works against final DOM which may be different from the actual HTML source, so you simply can't rely on xpath from Chrome for use in Scrapy.
Upvotes: 2