Reputation: 71
When I inspect the website(google search), I'm able to select my desired href by searching for this //div[@class="r"]/a/@href
through the finder. But when using scrapy and accessing by response.xpath('//div[@class="r"]/a/@href'
) this will return empty. Many other Xpath such as link title will also result empty. Strangely enough, I'm able to get something when using response.xpath('//cite').get()
, which is basically the href but incomplete.
If I do response.body
I'm able to see my desired href deep into the code but I have no idea how to access it. Trying to select it through traditional methods css or xpath that would work in any other website has been futile.
Upvotes: 0
Views: 2143
Reputation: 700
The reason the xpath you're using work on your browser but no in the response, is because Google displays the page differently if JS is disabled, which is the case for scrapy but not your browser, so you'll need to use an XPath that will work for both or just the first case.
This one works for no JS but won't work in the browser (if JS is enabled):
//div[@id='ires']//h3/a[1]/@href
This will return the first URL of the first result.
Upvotes: 2
Reputation: 14135
Try the below.
response.xpath("//div[@class='r']").xpath("//a/@href").extract()
Upvotes: 0