Reputation: 6713
I want to get next page in https://www.google.com.tw/webhp?sourceid=chrome-instant&ion=1&espv=2&ie=UTF-8#q=test
But my code not work.
Please guide me. Thank you so much.
scrapy shell "https://www.google.com.tw/webhp?sourceid=chrome-instant&ion=1&espv=2&ie=UTF-8#q=test"
response.xpath("//a[@id='pnnext']/@href")
Upvotes: 2
Views: 508
Reputation: 5814
Here it is the working code
scrapy shell "https://www.google.com.tw/search?q=test"
response.xpath("//a[@id='pnnext']/@href")
The issue was in the way you were making the request to google.
In any case be aware about the policy dealing with Google search.
Google's Custom Search Terms of Service (TOS) can be found at http://www.google.com/cse/docs/tos.html.
UPDATE: I wrote down a spider to test more in deep this issue.
Not pythonic at all (improvements are welcome), but I was interested in the mechanism of dealing with google results.
As previous comments suggested, a test for the internationalization of the interface is needed.
class googleSpider(CrawlSpider):
name = "googlish"
allowed_domains = ["google.com"]
start_urls = ["http://www.google.com"]
def __init__(self):
self.driver = webdriver.Firefox()
def parse(self, response):
self.driver.get(response.url)
login_form = self.driver.find_element_by_name('q')
login_form.send_keys("scrapy\n")
time.sleep(4)
found = False
while not found:
try :
for element in self.driver.find_elements_by_xpath("//div[@class='rc']"):
print element.text + "\n"
for i in self.driver.find_elements_by_id('pnnext'):
i.click()
time.sleep(5)
except NoSuchElementException:
found = True
pass
self.driver.close()
Upvotes: 2
Reputation: 1956
Can you try using below x path and let me know what the result is.Looks like the the xpath used is not pointing to the exact location of web-element in the DOM.
//a[@id='pnnext']//span[2]
Upvotes: 0