Reputation: 11
I am facing a issue with scrapy pagination.
Here is the html:
<a href="" onclick="return false;" class="archive_page_info"
id="next_achive_button" data-number_page_click="2">NEXT</a>
Scrapy python approach:
#follow pagination links
next_page_url = response.css("#next_achive_button").extract_first()
if next_page_url:
next_page_url = response
yield scrapy.Request(url=next_page_url, callback=self.parse)
I need some help to solve this, when I click next button it should go to the next page. However, I see the next href is on onclick="return false;"
I don't know how to do solve this issue. Could you please provide me with some hints how to solve the issue above. Thanks.
Upvotes: 0
Views: 714
Reputation: 21201
Learn how to use Inspect
in Chrome or Firebug if you have Mozilla.
Click on Preserve Logs
and then click on next page button, you will see this AJAX POST being fired.
import requests
cookies = {
'__unam': '7639673-16295793afa-1ab158d0-2',
'__utma': '56229998.2107893981.1522926175.1522926175.1522926175.1',
'__utmc': '56229998',
'__utmz': '56229998.1522926175.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none)',
'__utmt': '1',
'__utmb': '56229998.1.10.1522926175',
'_first_pageview': '1',
'__qca': 'P0-1006667184-1522926176233',
'_jsuid': '585270328',
'no_trackyy_100969001': '1',
'__atuvc': '1%7C14',
'__atuvs': '5ac6025f8fcb6eab000',
'__tbc': '%7Bjzx%7DIafCBS3b0wpS60-QMtzjGoXcgB2LuqBv13vshDxFKXzUXsJfILJAOyJBA8fT0NrLuAw9JkikXT-lxGWsIpDKlbAJG-Kkoz0pLPzCOLd06VAHO90uO2kuCkU83cHKD7GRaOuzBb9gsuOCm70ShIsd5Q',
'__pat': '-14400000',
'__pvi': '%7B%22id%22%3A%22v-2018-04-05-16-02-58-224-d9oQ6Ns4C5cJ79uD-02aeb22c0032f00f6131c0dfebc6b934%22%2C%22domain%22%3A%22.therealdeal.com%22%2C%22time%22%3A1522926179784%7D',
'xbc': '%7Bjzx%7DPVPoYpACRK8IQh-L66G6Lf11La8U3KDJG42A358oKni-AhQB0dxnTTq_CM95WKsZWHv9fY5JWLkSs5KImxmuRbiETxj07xc3lSSyb53w6bNyQuiiqqE20nVKEniUHDvl9zcfaHGMtBfOKaRmlxOx3TnX34PCjdEudjMUtEx_n9gwp4UEWknk1qUZNvvp7TLK-U4hyrWfMZZezw6MVfaRX5CZGW7Wg6zJ565EiqML9pJ9aeCUAUzgoy7pLjGXLxxtCBVOpfzQAi2b_SJnf2-Pe3KNCXlNvZ7Tr1GylPSVBkP1SYwS237iji2rMBo1YoeZ',
'_eventqueue': '%7B%22heatmap%22%3A%5B%7B%22type%22%3A%22heatmap%22%2C%22href%22%3A%22%2Fnew-research%2Ftopics%2Fpeople%2F%22%2C%22x%22%3A795%2C%22y%22%3A3283%2C%22w%22%3A1366%7D%2C%7B%22type%22%3A%22heatmap%22%2C%22href%22%3A%22%2Fnew-research%2Ftopics%2Fpeople%2F%22%2C%22x%22%3A800%2C%22y%22%3A3268%2C%22w%22%3A1366%7D%5D%2C%22events%22%3A%5B%5D%7D',
}
headers = {
'Origin': 'https://therealdeal.com',
'Accept-Encoding': 'gzip, deflate, br',
'Accept-Language': 'en-US,en;q=0.9',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.181 Safari/537.36',
'Content-Type': 'application/x-www-form-urlencoded; charset=UTF-8',
'Accept': 'text/html, */*; q=0.01',
'Referer': 'https://therealdeal.com/new-research/topics/people/',
'X-Requested-With': 'XMLHttpRequest',
'Connection': 'keep-alive',
'DNT': '1',
}
data = [
('action', 'display_filtered_archives_of_trd_topics'),
('filtered_type', 'People'),
('number_of_click_page', '3'),
]
response = requests.post('https://therealdeal.com/wp-admin/admin-ajax.php', headers=headers, cookies=cookies, data=data)
Upvotes: 1