Geek Online
Geek Online

Reputation: 53

How to extract data from a dropdown menu using python beautifulsoup

I am trying to scrape data from a website that has a multilevel drop-down menu every time an item is selected it changes the sub items for sub drop-downs. problem is that for every loop it extracts same sub items from the drop down items. the selection happens but it do not update the items on behalf of new selection from loop can any one help me why I am not getting the desired results. Perhaps this is because my drop-down list is in java Script or something.

for instance like this manue in the picture below: image of the dropdown i have gone this far:

enter code here

from selenium import webdriver
from selenium.webdriver.support.ui import Select
from selenium.webdriver.common.by import By
import csv

//#from selenium.webdriver.support import Select 
import time

print ("opening chorome....")  
driver = webdriver.Chrome()
driver.get('https://www.wheelmax.com/')
time.sleep(10)

csvData = ['Year', 'Make', 'Model', 'Body', 'Submodel', 'Size']

//#variables
yeart = []
make= []
model=[]
body = []
submodel = []
size = []
Yindex = Mkindex = Mdindex = Bdindex = Smindex = Sindex = 0

print ("waiting for program to set variables....")
time.sleep(20)

print ("initializing and setting variables....")

//#initializing Year
Year = Select(driver.find_element_by_id("icm-years-select"))
Year.select_by_value('2020')
yr = driver.find_elements(By.XPATH, '//*[@id="icm-years-select"]')
time.sleep(15)

//#initializing Make
Make = Select(driver.find_element_by_id("icm-makes-select"))
Make.select_by_index(1)
mk = driver.find_elements(By.XPATH, '//*[@id="icm-makes-select"]')
time.sleep(15)

//#initializing Model
Model = Select(driver.find_element_by_id("icm-models-select"))
Model.select_by_index(1)
mdl = driver.find_elements(By.XPATH, '//*[@id="icm-models-select"]')
time.sleep(15)

//#initializing body
Body = Select(driver.find_element_by_id("icm-drivebodies-select"))
Body.select_by_index(1)
bdy = driver.find_elements(By.XPATH, '//*[@id="icm-drivebodies-select"]')
time.sleep(15)

//#initializing submodel
Submodel = Select(driver.find_element_by_id("icm-submodels-select"))
Submodel.select_by_index(1)
sbm = driver.find_elements(By.XPATH, '//*[@id="icm-submodels-select"]')
time.sleep(15)

//#initializing size
Size = Select(driver.find_element_by_id("icm-sizes-select"))
Size.select_by_index(0)
siz = driver.find_elements(By.XPATH, '//*[@id="icm-sizes-select"]')
time.sleep(5)


Cyr = Cmk = Cmd = Cbd = Csmd = Csz = ""

print ("fetching data from variables....")

for y in yr:
    obj1 = driver.find_element_by_id("icm-years-select")
    Year = Select(obj1)
    Year.select_by_index(++Yindex)
    obj1.click()
    #obj1.click()
    yeart.append(y.text)
    Cyr = y.text
    time.sleep(10)
    for m in mk:
        obj2 = driver.find_element_by_id("icm-makes-select")
        Make = Select(obj2)
        Make.select_by_index(++Mkindex)
        obj2.click()
        #obj2.click()
        make.append(m.text)
        Cmk = m.text
        time.sleep(10)
        for md in mdl:
            Mdindex =0
            obj3 = driver.find_element_by_id("icm-models-select")
            Model = Select(obj3)
            Model.select_by_index(++Mdindex)
            obj3.click()
            #obj3.click(clickobj)
            model.append(md.text)
            Cmd = md.text
            time.sleep(10)
            Bdindex = 0
            for bd in bdy:
                obj4 = driver.find_element_by_id("icm-drivebodies-select")
                Body = Select(obj4)
                Body.select_by_index(++Bdindex)
                obj4.click()
                #obj4.click(clickobj2)
                body.append(bd.text)
                Cbd = bd.text
                time.sleep(10)
                Smindex = 0
                for sm in sbm:
                    obj5 = driver.find_element_by_id("icm-submodels-select")
                    Submodel = Select(obj5)
                    obj5.click()
                    Submodel.select_by_index(++Smindex)
                    #obj5.click(clickobj5)
                    submodel.append(sm.text)
                    Csmd = sm.text
                    time.sleep(10)
                    Sindex = 0
                    for sz in siz:
                        Size = Select(driver.find_element_by_id("icm-sizes-select"))
                        Size.select_by_index(++Sindex)
                        size.append(sz.text)
                        Scz = sz.text
                        csvData += [Cyr, Cmk, Cmd, Cbd,Csmd, Csz]

Upvotes: 4

Views: 6988

Answers (3)

bharatk
bharatk

Reputation: 4315

Because of https://www.wheelmax.com has multilevel drop-down menu dependent on each other for example if you select Select Year drop down option, after selected year based on Select Make drop down is enable and display option based on the selected year option.

So basically you need to use Selenium package for handle dynamic option.

Install selenium web driver as per your browser

Download chrome web driver :

http://chromedriver.chromium.org/downloads

Install web driver for chrome browser:

unzip ~/Downloads/chromedriver_linux64.zip -d ~/Downloads
chmod +x ~/Downloads/chromedriver
sudo mv -f ~/Downloads/chromedriver /usr/local/share/chromedriver
sudo ln -s /usr/local/share/chromedriver /usr/local/bin/chromedriver
sudo ln -s /usr/local/share/chromedriver /usr/bin/chromedriver

selenium tutorial

https://selenium-python.readthedocs.io/

Eg. using selenium to select multiple dropdown options

from selenium import webdriver
from selenium.webdriver.support.ui import Select
import time

driver = webdriver.Chrome()
driver.get('https://www.wheelmax.com/')
time.sleep(4)

selectYear = Select(driver.find_element_by_id("icm-years-select"))
selectYear.select_by_value('2019')

time.sleep(2)

selectMakes = Select(driver.find_element_by_id("icm-makes-select"))
selectMakes.select_by_value('58')

Update:

select drop down option value or count total options

for option in selectYear.options:
    print(option.text)

print(len(selectYear.options))

Se more

Upvotes: 1

QHarr
QHarr

Reputation: 84465

How to extract data from a dropdown menu using python beautifulsoup

The page does a callback to populate with years. Simply mimic that.

If you actually need to change years and select from dependent drop downs, which becomes a different question, you need browser automation e.g. selenium, or to manually perform this and inspect network tab to see if there is an xhr request you can mimic to submit your choices.

import requests
​
r = requests.get('https://www.iconfigurators.com/json2/?returnType=json&bypass=true&id=13898&callback=yearObj').json()
years = [item['year'] for item in r['years']]
print(years)

Upvotes: 1

Mario Christov
Mario Christov

Reputation: 143

I guess the reason you can't parse the years with beautiful soup is because the 'select' tag containing the 'option' tags with all the years is not present yet/is hidden at the moment when beautiful soup downloads the page. It is added to the DOM by executing additional JavaScript I assume. If you look at the DOM of the loaded page using the developer tools of your browser, for example F12 for Mozilla, you'll see that the tag containing the information you look for is: <select id="icm-years-select"">. If you try to parse for this tag with the object downloaded with beautiful soup, you get an empty list of tag objects:

from bs4 import BeautifulSoup
from requests import get
response = get('https://www.wheelmax.com/')
yourSoup = BeautifulSoup(response.text, "lxml")
print(len(yourSoup.select('div #vehicle-search'))) // length = 1 -> visible
print()
print(len(yourSoup.select('#icm-years-select')))    // length = 0 -> not visible

So if you want to get the years by using Python by all means, I guess you might try to click on the respective tag and then parse again using some combination of requests/beautiful soup/ or the selenium module which will require a bit more digging :-)

Otherwise if you just quickly need the years parsed, use JavaScript:

countYears = document.getElementById('icm-years-select').length;
yearArray = [];
for (i = 0; i < countYears; i++) {yearArray.push(document.getElementById('icm-years-select')[i].value)};

Upvotes: 0

Related Questions