Reputation: 3
I am very new to python. I am trying to extract the full list of countries + districts from this website : https://www.expo2020dubai.com/en/understanding-expo/participants/country-pavilions
To get the full list, there is a javascript Load More button. The URL doesn't change when clicking on LOAD MORE Button.
Mobility Albania PavilionI want to extract the District that is on the "content__subtitle" class and the Country that is on the "content__title" class.
Here is my full script. I can extract from the first page but I don't know how to get the other pages.
Also, I am writing the result to a CSV file but for some reason nothing is written with my code.
import requests
from pprint import pprint
from bs4 import BeautifulSoup
import csv
url = "https://www.expo2020dubai.com/en/understanding-expo/participants/country-pavilions"
liste = []
def extract_pavillons(url):
r = requests.get(url)
soup = BeautifulSoup(r.content, 'html.parser')
with open('pavillons.csv', 'w', encoding='UTF8') as f:
for item in soup.find_all(class_=["c-innovator-card -filter-list","c-innovator-card__container"]):
country = item.find(class_='content__title').text
country = country[:len(country) - 9]
district = item.find(class_='content__subtitle').text
liste.append([country, district])
def write_pavillons(liste):
with open('pavillons.csv', 'w', newline='') as f:
writer = csv.writer(f)
writer.writerows(liste)
extract_pavillons(url)
write_pavillons(liste)
Upvotes: 0
Views: 131