Siktime
Siktime

Reputation: 51

How to scrape data from a website using admin-ajax.php with Scrapy

I am trying to scrape the reviews about unibet casino on that website : https://casinoplacard.com/unibet-casino-reviews-and-bonuses/

As I did for other sources of reviews I used Scrapy on Python to scrape the reviews with the code below :

class slotRunner_spyder(scrapy.Spider):
count=0

name = "slotRunner_spyder"
start_urls = [

       'https://casinoplacard.com/unibet-casino-reviews-and-bonuses/'
]
def parse(self, response):

    parsed_uri = urlparse(response.url)
    domain = '{uri.scheme}://{uri.netloc}/'.format(uri=parsed_uri)

    for review in response.css('div.rwp-users-reviews > div.rwp-u-review') :
        self.count+=1
        yield {
            'name': review.css('td a::text').extract_first(),
            'date': review.css('td small::text').extract_first(),
            'review': review.css('div.rwp-u-review__content > div.rwp-u-review__comment').extract(),
            'url' : response.url
        }
    print(self.count)

But for that website it does not work. To understand better I have introduced the counter (self.count) and discover that it do only 1 iteration which is not normal...

Then I have spent some tiem studying the DevTools of that website and I have discover that when the page is loaded, a XHR POST request method is done automatically with the URL : https://casinoplacard.com/wp-admin/admin-ajax.php

And by looking into that request I have found the 182 reviews data in :

Preview >> Data >> Reviews

So could you guys please help me understand how it works to catch those data ?

Thank you very much !

Upvotes: 1

Views: 1332

Answers (1)

Siktime
Siktime

Reputation: 51

I finally found how to do so, I am sure this is not the best way but at least I did what I wanted to do.

So as I said in my question in the preview tab there were all the data I needed. So what I had to do was getting those data. To do so I understood that when the URL is loaded that XHR POST request were made automatically so I just tried to force python to request that URL.

import requests
s = requests.Session()
# We get the URL into that session
s.get(url)
#Here is the imitation of the POST request 
self.r = s.post(ajax_URL,data=param,headers=headers)`

The parameters you just get them from the headers tab of the DevTool, then the form data is your parameters. For the header you get it also in the header tab, you search for User-Agent and just paste all that in the headers. The ajax URL is the one I wrote in my question.

Hope that will help someone.

Upvotes: 2

Related Questions