Csongor
Csongor

Reputation: 593

Beautifulsoup + Python HTML UL targeting, creating a list and appending to variables

I'm trying to scrape Autotrader's website to get an excel of the stats and names.

I'm stuck at trying to loop through an html 'ul' element without any classes or IDs and organize that info in python list to then append the individual li elements to different fields in my table.

As you can see I'm able to target the title and price elements, but the 'ul' is really tricky... Well... for someone at my skill level.

The specific code I'm struggling with:

for i in range(1, 2):
    response = get('https://www.autotrader.co.uk/car-search?sort=sponsored&seller-type=private&page=' + str(i))
    html_soup = BeautifulSoup(response.text, 'html.parser')
    ad_containers = html_soup.find_all('h2', class_ = 'listing-title title-wrap')
    price_containers = html_soup.find_all('section', class_ = 'price-column')

    for container in ad_containers:
        name = container.find('a', class_ ="js-click-handler listing-fpa-link").text
        names.append(name)
        # Trying to loop through the key specs list and assigned each 'li' to a different field in the table
        lis = []
        list_container = container.find('ul', class_='listing-key-specs')
        for li in list_container.find('li'):
            lis.append(li)
        year.append(lis[0])
        body_type.append(lis[1])
        milage.append(lis[2])
        engine.append(lis[3])
        hp.append(lis[4])
        transmission.append(lis[5])
        petrol_type.append(lis[6])
        lis = [] # Clearing dictionary to get ready for next set of data

And the error message I get is the following: enter image description here

Full code here:

from requests import get
from bs4 import BeautifulSoup
import pandas
# from time import sleep, time
# import random

# Create table fields
names = []
prices = []
year = []
body_type = []
milage = []
engine = []
hp = []
transmission = []
petrol_type = []

for i in range(1, 2):
    # Make a get request
    response = get('https://www.autotrader.co.uk/car-search?sort=sponsored&seller-type=private&page=' + str(i))
    # Pause the loop
    # sleep(random.randint(4, 7))
    # Create containers
    html_soup = BeautifulSoup(response.text, 'html.parser')
    ad_containers = html_soup.find_all('h2', class_ = 'listing-title title-wrap')
    price_containers = html_soup.find_all('section', class_ = 'price-column')

    for container in ad_containers:
        name = container.find('a', class_ ="js-click-handler listing-fpa-link").text
        names.append(name)
        # Trying to loop through the key specs list and assigned each 'li' to a different field in the table
        lis = []
        list_container = container.find('ul', class_='listing-key-specs')
        for li in list_container.find('li'):
            lis.append(li)
        year.append(lis[0])
        body_type.append(lis[1])
        milage.append(lis[2])
        engine.append(lis[3])
        hp.append(lis[4])
        transmission.append(lis[5])
        petrol_type.append(lis[6])
        lis = [] # Clearing dictionary to get ready for next set of data
    for pricteainers in price_containers:
        price = pricteainers.find('div', class_ ='vehicle-price').text
        prices.append(price)

test_df = pandas.DataFrame({'Title': names, 'Price': prices, 'Year': year, 'Body Type': body_type, 'Mileage': milage, 'Engine Size': engine, 'HP': hp, 'Transmission': transmission, 'Petrol Type': petrol_type})
print(test_df.info())
# test_df.to_csv('Autotrader_test.csv')

Upvotes: 1

Views: 1096

Answers (2)

Ali
Ali

Reputation: 1357

I followed the advice from David in the other answer's comment area.

Code:

from requests import get
from bs4 import BeautifulSoup
import pandas as pd

pd.set_option('display.width', 1000)
pd.set_option('display.height', 1000)
pd.set_option('display.max_rows', 500)
pd.set_option('display.max_columns', 500)

names = []
prices = []
year = []
body_type = []
milage = []
engine = []
hp = []
transmission = []
petrol_type = []

for i in range(1, 2):
    response = get('https://www.autotrader.co.uk/car-search?sort=sponsored&seller-type=private&page=' + str(i))
    html_soup = BeautifulSoup(response.text, 'html.parser')

    outer = html_soup.find_all('article', class_='search-listing')
    for inner in outer:
        lis = []
        names.append(inner.find_all('a', class_ ="js-click-handler listing-fpa-link")[1].text)
        prices.append(inner.find('div', class_='vehicle-price').text)
        for li in inner.find_all('ul', class_='listing-key-specs'):
            for i in li.find_all('li')[-7:]:
                lis.append(i.text)
        year.append(lis[0])
        body_type.append(lis[1])
        milage.append(lis[2])
        engine.append(lis[3])
        hp.append(lis[4])
        transmission.append(lis[5])
        petrol_type.append(lis[6])

test_df = pd.DataFrame.from_dict({'Title': names, 'Price': prices, 'Year': year, 'Body Type': body_type, 'Mileage': milage, 'Engine Size': engine, 'HP': hp, 'Transmission': transmission, 'Petrol Type': petrol_type}, orient='index')
print(test_df.transpose())

Output:

                                Title Price           Year  Body Type        Mileage Engine Size      HP Transmission Petrol Type
0    Citroen C3 1.4 HDi Exclusive 5dr  £500  2002 (52 reg)  Hatchback  123,065 miles        1.4L   70bhp       Manual      Diesel
1                Volvo V40 1.6 XS 5dr  £585   1999 (V reg)     Estate  125,000 miles        1.6L  109bhp       Manual      Petrol
2  Toyota Yaris 1.3 VVT-i 16v GLS 3dr  £700   2000 (W reg)  Hatchback   94,000 miles        1.3L   85bhp    Automatic      Petrol
3               MG Zt-T 2.5 190 + 5dr  £750  2002 (52 reg)     Estate   95,000 miles        2.5L  188bhp       Manual      Petrol
4       Volkswagen Golf 1.9 SDI E 5dr  £795  2001 (51 reg)  Hatchback  153,000 miles        1.9L   68bhp       Manual      Diesel
5   Volkswagen Polo 1.9 SDI Twist 5dr  £820  2005 (05 reg)  Hatchback  106,116 miles        1.9L   64bhp       Manual      Diesel
6     Volkswagen Polo 1.4 S 3dr (a/c)  £850  2002 (02 reg)  Hatchback  125,640 miles        1.4L   75bhp       Manual      Petrol
7              KIA Picanto 1.1 LX 5dr  £990  2005 (05 reg)  Hatchback  109,000 miles        1.1L   64bhp       Manual      Petrol
8    Vauxhall Corsa 1.2 i 16v SXi 3dr  £995  2004 (54 reg)  Hatchback   81,114 miles        1.2L   74bhp       Manual      Petrol
9           Volkswagen Beetle 1.6 3dr  £995  2003 (53 reg)  Hatchback  128,000 miles        1.6L  102bhp       Manual      Petrol

Upvotes: 1

dVeza
dVeza

Reputation: 552

The ul is not a child of the h2 . It's a sibling.

So you will need to make a separate selection because it's not part of the ad_containers.

Upvotes: 1

Related Questions