Reputation: 67
I'm trying to fetch the data from the website. I have managed to scrape the data from the first page of the website.
But for the next page website loads data using AJAX, for that, I set headers but couldn't able to get the data from the next page.
If we send requests to the website without headers the same data we get. So maybe I didn't set headers in the right way to move to the next page. I used CURL for headers.
Where I did wrong?
class MenSpider(scrapy.Spider):
name = "MenCrawler"
allowed_domains = ['monark.com.pk']
#define headers and 'custom_constraint' as page
headers = {
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.128 Safari/537.36',
'accept-language': 'en-PK,en-US;q=0.9,en;q=0.8',
'key':'274246071',
'custom_constraint':'custom-filter page=1',
'view' : 'ajax',
'_':'1618681277011'
}
#send request
def start_requests(self):
yield scrapy.Request(
url = 'https://monark.com.pk/collections/t-shirts',
method = 'GET',
headers=self.headers,
callback=self.update_headers
)
#response
def update_headers(self,response):
#extract all the 12 URLS from the page
urls = response.xpath('//h4[@class="h6 m-0 ff-main"]/a/@href').getall()
for url in urls:
yield response.follow(url=url, callback=self.parse)
#extract the infinite text as 'LOADING'
load = response.xpath('//div[@class="pagination"]//span/text()').get()
#Use if Condition for pagination
if load == 'LOADING':
page = 1
#define page no as key form dictionary
key = self.headers['custom_constraint']
current_page = key.split('=')[-1]
next_pag = page+int(current_page)
filters = 'custom-filter page='+str(next_pag)
self.headers['custom_constraint'] = filters
#request againg to page for next page BUT THIS IS NOT WORKING FOR ME
yield scrapy.Request(
url = 'https://monark.com.pk/collections/t-shirts',
method = 'GET',
headers=self.headers,
callback=self.update_headers
)
def parse(self, response):
........
Upvotes: 0
Views: 297
Reputation: 417
Your code is reusing the same key, this may be what is causing the same page to load again. Try removing 'key' from headers or identify how they are created
Below are the keys that I found from initial inspection
https://monark.com.pk/collections/t-shirts?key=172181120&custom_constraint=custom-filter+page=4&view=ajax&_=1618763278994
https://monark.com.pk/collections/t-shirts?key=205204897&custom_constraint=custom-filter+page=5&view=ajax&_=1618763278995
Upvotes: 1