William
William

Reputation: 3

Web-scraping with python3.9 with a Load more button

I am very new to python. I am trying to extract the full list of countries + districts from this website : https://www.expo2020dubai.com/en/understanding-expo/participants/country-pavilions

To get the full list, there is a javascript Load More button. The URL doesn't change when clicking on LOAD MORE Button.

Mobility

Albania Pavilion

I want to extract the District that is on the "content__subtitle" class and the Country that is on the "content__title" class.

Here is my full script. I can extract from the first page but I don't know how to get the other pages.

Also, I am writing the result to a CSV file but for some reason nothing is written with my code.

import requests
from pprint import pprint
from bs4 import BeautifulSoup
import csv

url = "https://www.expo2020dubai.com/en/understanding-expo/participants/country-pavilions"

liste = []

def extract_pavillons(url):
    r = requests.get(url)
    soup = BeautifulSoup(r.content, 'html.parser')

    with open('pavillons.csv', 'w', encoding='UTF8') as f:
        for item in soup.find_all(class_=["c-innovator-card -filter-list","c-innovator-card__container"]):
            country = item.find(class_='content__title').text
            country = country[:len(country) - 9]
            district = item.find(class_='content__subtitle').text

            liste.append([country, district])


def write_pavillons(liste):
    with open('pavillons.csv', 'w', newline='') as f:
        writer = csv.writer(f)
        writer.writerows(liste)

extract_pavillons(url)

write_pavillons(liste)

Upvotes: 0

Views: 131

Answers (1)

HedgeHog
HedgeHog

Reputation: 25048

Option I

Use their api -> developer tools xhr requests:

https://www.expo2020dubai.com/api/CardFilter/CardFilterLoadMore?ds=%7B1D599577-83BF-4AD3-ADDF-8409AC9CC359%7D&currentCount=8&LoadMoreCount=8&filters=%2C%2C&typeCategory=Country%2C&pageItemId=%7B91F63910-2B2D-45B7-BAA0-9338BBB101C7%7D&_=1636012101921

With parameter currentCount and LoadMoreCount you can fetch additional items.

Option II

Use selenium to click on the button and grab the next items

Upvotes: 1

Related Questions