najat ma
najat ma

Reputation: 13

Web scraping after clicking on two buttons

I am trying to scrape data from this website http://rgphentableaux.hcp.ma/Default1/ by clicking on the 2 radio buttons, then by choosing from a list like this enter image description here

I need to do this for all the choices available in that list and add the tables to a dataframe I have here is what I tried so far but it didn't work

    from bs4 import BeautifulSoup 
    pip install selenium
    from selenium import webdriver
    browser=webdriver.Chrome()
    url = "http://rgphentableaux.hcp.ma/Default1/"
    browser.get(url) #navigate to the page
    browser.find_element_by_xpath(".//input[@type='radio' and 
                     @value='5']").click()
    browser.find_element_by_id("CGEO").click()
    time.sleep(3)
    browser.find_element_by_xpath(".//input[@type='button' and 
                    @value='Afficher']").click()
    tabs = browser.find_elements_by_id('IEE')
    innerHTML = browser.execute_script("return 
                 document.body.innerHTML")
    soup_level2=BeautifulSoup(innerHTML, 'html.parser')

Ps: I need to get the tables that are here too enter image description here

Upvotes: 1

Views: 109

Answers (2)

QHarr
QHarr

Reputation: 84465

You could do the whole thing with requests and bs4 by mimicking the requests the page makes. You just need to loop the regions, in the right order, and add the current region number to the 'CGEO' param in each request.


This:

soup = bs(s.get(url).content, 'lxml')
regions = [i.text.strip() for i in soup.select('#REGIONSLIST option')]

gathers an initial list of the region names from the landing url.


This:

for k,v in regions.items():
    params = (('type', 'Region'), ('CGEO', v), ('them', '5'))

sets the CGEO param with the option tag value attribute for the region e.g. Tanger-Tetouan-Al Hoceima is '01'.

Region option is set within the type param.

Langues locales utilisées option is set within the them param i.e. '5'.


This:

for y in range(3):
    row.extend([data[i-y+2]['DATA2014']])

just reverses the order of items such that Ens, Fem, Masc in each dictionary within data gets added to the row in the desired output order of Masc, Fem, Ens.


Py:

import requests
import pandas as pd
from bs4 import BeautifulSoup as bs

def add_rows(region, data):
    for i in range(0, len(data)//3, 3):
        row = [region, data[i]['INDICATEUR'].split('_')[-1]]
        for y in range(3):
            row.extend([data[i-y+2]['DATA2014']])
        final.append(row)
        
url = 'http://rgphentableaux.hcp.ma/Default1'    
headers= {'User-Agent': 'Mozilla/5.0',  'Referer': url}    
final = []

with requests.Session() as s:
    s.headers = headers
    soup = bs(s.get(url).content, 'lxml')
    regions = {i.text.strip():i['value'].strip() for i in soup.select('#REGIONSLIST option')}
    
    for k,v in regions.items():
        params = (('type', 'Region'), ('CGEO', v), ('them', '5'))
        r = s.get(f'{url}/getDATA/', params=params)
        data = r.json()
        add_rows(k, data)
    
df = pd.DataFrame(final, columns = ['Region', 'Lang', 'Masc', 'Fem', 'Ens'])
print(df)

EDIT:

To get all 3 tables (ensemble, urbain, rural) adjust the custom function as shown below and add in the additional loop for n in range(0, len(data), block):

import requests
import pandas as pd
from bs4 import BeautifulSoup as bs

def add_rows(table, region, data_block):
    for i in range(0, len(data_block), 3):
        row = [table, region, data_block[i]['INDICATEUR'].split('_')[-1]]
        for y in range(3):
            row.extend([data_block[i-y+2]['DATA2014']])
        final.append(row)
        
url = 'http://rgphentableaux.hcp.ma/Default1'    
headers= {'User-Agent': 'Mozilla/5.0',  'Referer': url}
tables = ['ens', 'urb', 'rur']
final = []

with requests.Session() as s:
    s.headers = headers
    soup = bs(s.get(url).content, 'lxml')
    regions = {i.text.strip():i['value'].strip() for i in soup.select('#REGIONSLIST option')}
    
    for k,v in regions.items():
        params = (('type', 'Region'), ('CGEO', v), ('them', '5'))
        r = s.get(f'{url}/getDATA/', params=params)
        data = r.json()
        block = len(data)//3
        
        for n in range(0, len(data), block):
            table = tables[n//block]
            add_rows(table, k, data[n:n+block])
            
df = pd.DataFrame(final, columns = ['Table', 'Region', 'Language', 'Masc', 'Fem', 'Ens'])
print(df)

Upvotes: 1

undetected Selenium
undetected Selenium

Reputation: 193098

To select the item with text as Langues locales utilisées and Region and scrape the table you can use the following solution:

driver.get("http://rgphentableaux.hcp.ma/Default1/")
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//input[@text='Langues locales utilisées']"))).click()
driver.find_element_by_xpath("//input[@value='Region']").click()
driver.execute_script("return arguments[0].scrollIntoView(true);", WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//input[@value='Choisir une entitée']"))))
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//input[@value='Choisir une entitée']"))).click()
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//li[contains(., 'Tanger-Tetouan-Al Hoceima')]"))).click()
driver.find_element_by_xpath("//input[@value='Afficher']").click()
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//table[@class='tableau']/tbody"))).text)

Console Output:

Population municipale 16 747 522 16 862 562 33 610 084
Répartition selon les grands groupes d'âges
Moins de 6 ans 12.4 11.8 12.1
De 6 à 14 ans 16.5 15.7 16.1
De 15 à 59 ans 61.8 63.0 62.4
60 ans et plus 9.3 9.5 9.4
Répartition selon le groupe d'âges quinquennal
0-4 ans 10.4 9.9 10.2
5-9 ans 9.2 8.8 9.0
10-14 ans 9.3 8.8 9.0
15-19 ans 8.9 8.8 8.9
20-24 ans 9.0 9.1 9.1
25-29 ans 8.2 8.4 8.3
30-34 ans 7.7 8.0 7.8
35-39 ans 6.8 7.2 7.0
40-44 ans 6.3 6.5 6.4
45-49 ans 5.3 5.6 5.4
50-54 ans 5.3 5.4 5.3
55-59 ans 4.2 4.0 4.1
60-64 ans 3.4 3.3 3.4
65-69 ans 1.9 1.9 1.9
70-74 ans 1.6 1.8 1.7
75 ans et plus 2.4 2.6 2.5
État matrimonial
Célibataire 57.9 48.4 53.2
Marié 40.8 42.0 41.4
Divorcé 0.7 2.4 1.6
Veuf 0.6 7.1 3.9
Âge moyen au premier mariage 31.3 25.7 28.5
Fécondité
Parité moyenne à 45-49 ans / 3.5 /
Indice synthétique de fécondité / 2.2 /

Upvotes: 0

Related Questions