B B
B B

Reputation: 71

Selecting the first link in a google search

When I inspect the website(google search), I'm able to select my desired href by searching for this //div[@class="r"]/a/@href through the finder. But when using scrapy and accessing by response.xpath('//div[@class="r"]/a/@href') this will return empty. Many other Xpath such as link title will also result empty. Strangely enough, I'm able to get something when using response.xpath('//cite').get(), which is basically the href but incomplete.

If I do response.body I'm able to see my desired href deep into the code but I have no idea how to access it. Trying to select it through traditional methods css or xpath that would work in any other website has been futile.

Upvotes: 0

Views: 2143

Answers (2)

Ayoub_B
Ayoub_B

Reputation: 700

The reason the xpath you're using work on your browser but no in the response, is because Google displays the page differently if JS is disabled, which is the case for scrapy but not your browser, so you'll need to use an XPath that will work for both or just the first case.

This one works for no JS but won't work in the browser (if JS is enabled):

//div[@id='ires']//h3/a[1]/@href

This will return the first URL of the first result.

Upvotes: 2

supputuri
supputuri

Reputation: 14135

Try the below.

response.xpath("//div[@class='r']").xpath("//a/@href").extract()

Upvotes: 0

Related Questions