Reputation: 61
When trying to change the response.url with response.replace before calling a yield Request, i get the same results ? The syntax seems to be correct tough.
print(response.url)
response.replace(url='https://techcrunch.com/search/heartbleed#stq=heartbleed&stp=2')
print(response.url)
next = self.driver.find_element(By.XPATH,"//a[@class='page-link next']")
nextpage = next.get_attribute("href")
yield scrapy.Request(url=nextpage, dont_filter=False)
note :
1. im assigning the url twice (obv. not needed if it would work ... grrr)
2.nextpage is the exact same url as in the 2 line of the code
output:
https://techcrunch.com/search/heartbleed
https://techcrunch.com/search/heartbleed
2017-06-15 15:09:55 [selenium.webdriver.remote.remote_connection] DEBUG: POST http://127.0.0.1:56740/wd/hub/session/e3ba0740-51cb-11e7-acb6-f1825cec3f42/element {"using": "xpath", "sessionId": "e3ba0740-51cb-11e7-acb6-f1825cec3f42", "value": "//a[@class='page-link next']"}
2017-06-15 15:09:55 [selenium.webdriver.remote.remote_connection] DEBUG: Finished Request
2017-06-15 15:09:55 [selenium.webdriver.remote.remote_connection] DEBUG: GET http://127.0.0.1:56740/wd/hub/session/e3ba0740-51cb-11e7-acb6-f1825cec3f42/element/:wdc:1497532195411/attribute/href {"sessionId": "e3ba0740-51cb-11e7-acb6-f1825cec3f42", "name": "href", "id": ":wdc:1497532195411"}
2017-06-15 15:09:55 [selenium.webdriver.remote.remote_connection] DEBUG: Finished Request
i have the feeling that this is the reason why i cant go to other links, since the response always stays on the same site, instead of following the new links
Upvotes: 1
Views: 3124
Reputation: 217
i guess the replace method does not perform operation in place but return the result :
replace([url, status, headers, body, request, flags, cls])
Returns a Response object with the same members, except for those members given new values by whichever keyword arguments are specified.
So i would try something like :
new_response = response.replace(whatever=value)
Upvotes: 3