Scraping using BeautifulSoup only gets me 33 responses off of an infinite scrolling page. How do i increase the number of responses?

Question

The website link:

https://collegedunia.com/management/human-resources-management-colleges

The code:

import requests
from bs4 import BeautifulSoup

r = requests.get("https://collegedunia.com/management/human-resources-management-colleges")
c = r.content

soup = BeautifulSoup(c,"html.parser")


all = soup.find_all("div",{"class":"jsx-765939686 col-4 mb-4 automate_client_img_snippet"})

l = []
for divParent in all:
    item = divParent.find("div",{"class":"jsx-765939686 listing-block text-uppercase bg-white position-relative"})
    d = {}

    d["Name"] = item.find("div",{"class":"jsx-765939686 top-block position-relative overflow-hidden"}).find("div",{"class":"jsx-765939686 clg-name-address"}).find("h3").text

    d["Rating"] = item.find("div",{"class":"jsx-765939686 bottom-block w-100 position-relative"}).find("ul").find_all("li")[-1].find("a").find("span").text
    
    d["Location"] = item.find("div",{"class":"jsx-765939686 clg-head d-flex"}).find("span").find("span",{"class":"mr-1"}).text
    
    l.append(d)

import pandas
df = pandas.DataFrame(l)
df.to_excel("Output.xlsx")

The page keeps adding colleges as you scroll down, i dont know if i could get all the data, but is there a way to atleast increase the number of responses i get. There are a total of 2506 entries, as can be seen on the website?

pb36 · Accepted Answer

Seeing to your Question we can see it in the network requests data is being fetched from the ajax request and they are using base64 encoded params to fetch the data you can follow the below code to get the data and parse it in your desire format.

Code:

import json
import pandas
import requests
import base64

collegedata = []
count = 0
while True:
    datadict = {"url": "management/human-resources-management-colleges", "stream": "13", "sub_stream_id": "607",
                "page": count}
    data = base64.urlsafe_b64encode(json.dumps(datadict).encode()).decode()
    params = {
        "data": data
    }
    response = requests.get('https://collegedunia.com/web-api/listing', params=params).json()
    if response["hasNext"]:
        for i in response["colleges"]:
            d = {}
            d["Name"] = i["college_name"]
            d["Rating"] = i["rating"]
            d["Location"] = i["college_city"] + ", " + i["state"]
            collegedata.append(d)
            print(d)
    else:
        break
    count += 1

df = pandas.DataFrame(collegedata)
df.to_excel("Output.xlsx", index=False)

Output:

Let me know if you have any questions :)

Scraping using BeautifulSoup only gets me 33 responses off of an infinite scrolling page. How do i increase the number of responses?

Answers (2)

Related Questions