Javier Jaramillo
Javier Jaramillo

Reputation: 11

Scrapy pagination

I am facing a issue with scrapy pagination.

Here is the html:

<a href="" onclick="return false;" class="archive_page_info" 
id="next_achive_button" data-number_page_click="2">NEXT</a>

Scrapy python approach:

#follow pagination links
next_page_url =   response.css("#next_achive_button").extract_first()
if next_page_url:
   next_page_url = response
   yield scrapy.Request(url=next_page_url, callback=self.parse)

I need some help to solve this, when I click next button it should go to the next page. However, I see the next href is on onclick="return false;" I don't know how to do solve this issue. Could you please provide me with some hints how to solve the issue above. Thanks.

Upvotes: 0

Views: 714

Answers (1)

Umair Ayub
Umair Ayub

Reputation: 21201

Learn how to use Inspect in Chrome or Firebug if you have Mozilla.

Click on Preserve Logs and then click on next page button, you will see this AJAX POST being fired.

import requests

cookies = {
    '__unam': '7639673-16295793afa-1ab158d0-2',
    '__utma': '56229998.2107893981.1522926175.1522926175.1522926175.1',
    '__utmc': '56229998',
    '__utmz': '56229998.1522926175.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none)',
    '__utmt': '1',
    '__utmb': '56229998.1.10.1522926175',
    '_first_pageview': '1',
    '__qca': 'P0-1006667184-1522926176233',
    '_jsuid': '585270328',
    'no_trackyy_100969001': '1',
    '__atuvc': '1%7C14',
    '__atuvs': '5ac6025f8fcb6eab000',
    '__tbc': '%7Bjzx%7DIafCBS3b0wpS60-QMtzjGoXcgB2LuqBv13vshDxFKXzUXsJfILJAOyJBA8fT0NrLuAw9JkikXT-lxGWsIpDKlbAJG-Kkoz0pLPzCOLd06VAHO90uO2kuCkU83cHKD7GRaOuzBb9gsuOCm70ShIsd5Q',
    '__pat': '-14400000',
    '__pvi': '%7B%22id%22%3A%22v-2018-04-05-16-02-58-224-d9oQ6Ns4C5cJ79uD-02aeb22c0032f00f6131c0dfebc6b934%22%2C%22domain%22%3A%22.therealdeal.com%22%2C%22time%22%3A1522926179784%7D',
    'xbc': '%7Bjzx%7DPVPoYpACRK8IQh-L66G6Lf11La8U3KDJG42A358oKni-AhQB0dxnTTq_CM95WKsZWHv9fY5JWLkSs5KImxmuRbiETxj07xc3lSSyb53w6bNyQuiiqqE20nVKEniUHDvl9zcfaHGMtBfOKaRmlxOx3TnX34PCjdEudjMUtEx_n9gwp4UEWknk1qUZNvvp7TLK-U4hyrWfMZZezw6MVfaRX5CZGW7Wg6zJ565EiqML9pJ9aeCUAUzgoy7pLjGXLxxtCBVOpfzQAi2b_SJnf2-Pe3KNCXlNvZ7Tr1GylPSVBkP1SYwS237iji2rMBo1YoeZ',
    '_eventqueue': '%7B%22heatmap%22%3A%5B%7B%22type%22%3A%22heatmap%22%2C%22href%22%3A%22%2Fnew-research%2Ftopics%2Fpeople%2F%22%2C%22x%22%3A795%2C%22y%22%3A3283%2C%22w%22%3A1366%7D%2C%7B%22type%22%3A%22heatmap%22%2C%22href%22%3A%22%2Fnew-research%2Ftopics%2Fpeople%2F%22%2C%22x%22%3A800%2C%22y%22%3A3268%2C%22w%22%3A1366%7D%5D%2C%22events%22%3A%5B%5D%7D',
}

headers = {
    'Origin': 'https://therealdeal.com',
    'Accept-Encoding': 'gzip, deflate, br',
    'Accept-Language': 'en-US,en;q=0.9',
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.181 Safari/537.36',
    'Content-Type': 'application/x-www-form-urlencoded; charset=UTF-8',
    'Accept': 'text/html, */*; q=0.01',
    'Referer': 'https://therealdeal.com/new-research/topics/people/',
    'X-Requested-With': 'XMLHttpRequest',
    'Connection': 'keep-alive',
    'DNT': '1',
}

data = [
  ('action', 'display_filtered_archives_of_trd_topics'),
  ('filtered_type', 'People'),
  ('number_of_click_page', '3'),
]

response = requests.post('https://therealdeal.com/wp-admin/admin-ajax.php', headers=headers, cookies=cookies, data=data)

Upvotes: 1

Related Questions