user2492364
user2492364

Reputation: 6713

scrapy xpath: can't get google next page

I want to get next page in https://www.google.com.tw/webhp?sourceid=chrome-instant&ion=1&espv=2&ie=UTF-8#q=test

But my code not work.
Please guide me. Thank you so much.

  scrapy shell "https://www.google.com.tw/webhp?sourceid=chrome-instant&ion=1&espv=2&ie=UTF-8#q=test"
 response.xpath("//a[@id='pnnext']/@href")

Upvotes: 2

Views: 508

Answers (2)

aberna
aberna

Reputation: 5814

Here it is the working code

scrapy shell "https://www.google.com.tw/search?q=test"
response.xpath("//a[@id='pnnext']/@href")

The issue was in the way you were making the request to google.

In any case be aware about the policy dealing with Google search.

Google's Custom Search Terms of Service (TOS) can be found at http://www.google.com/cse/docs/tos.html.

UPDATE: I wrote down a spider to test more in deep this issue.

Not pythonic at all (improvements are welcome), but I was interested in the mechanism of dealing with google results.

As previous comments suggested, a test for the internationalization of the interface is needed.

class googleSpider(CrawlSpider):
    name = "googlish"
    allowed_domains = ["google.com"]
    start_urls = ["http://www.google.com"]

    def __init__(self):
        self.driver = webdriver.Firefox()

    def parse(self, response):
        self.driver.get(response.url)      
        login_form = self.driver.find_element_by_name('q')        
        login_form.send_keys("scrapy\n")
        time.sleep(4)
        found = False
        while not found:
            try :
                for element in self.driver.find_elements_by_xpath("//div[@class='rc']"):
                    print element.text + "\n"
                for i in self.driver.find_elements_by_id('pnnext'):
                    i.click()
                time.sleep(5)        
            except NoSuchElementException:
                found = True
                pass

        self.driver.close()

Upvotes: 2

Rupesh Shinde
Rupesh Shinde

Reputation: 1956

Can you try using below x path and let me know what the result is.Looks like the the xpath used is not pointing to the exact location of web-element in the DOM.

//a[@id='pnnext']//span[2]

Upvotes: 0

Related Questions