Reputation: 593
I'm trying to scrape Autotrader's website to get an excel of the stats and names.
I'm stuck at trying to loop through an html 'ul' element without any classes or IDs and organize that info in python list to then append the individual li elements to different fields in my table.
As you can see I'm able to target the title and price elements, but the 'ul' is really tricky... Well... for someone at my skill level.
The specific code I'm struggling with:
for i in range(1, 2):
response = get('https://www.autotrader.co.uk/car-search?sort=sponsored&seller-type=private&page=' + str(i))
html_soup = BeautifulSoup(response.text, 'html.parser')
ad_containers = html_soup.find_all('h2', class_ = 'listing-title title-wrap')
price_containers = html_soup.find_all('section', class_ = 'price-column')
for container in ad_containers:
name = container.find('a', class_ ="js-click-handler listing-fpa-link").text
names.append(name)
# Trying to loop through the key specs list and assigned each 'li' to a different field in the table
lis = []
list_container = container.find('ul', class_='listing-key-specs')
for li in list_container.find('li'):
lis.append(li)
year.append(lis[0])
body_type.append(lis[1])
milage.append(lis[2])
engine.append(lis[3])
hp.append(lis[4])
transmission.append(lis[5])
petrol_type.append(lis[6])
lis = [] # Clearing dictionary to get ready for next set of data
And the error message I get is the following:
Full code here:
from requests import get
from bs4 import BeautifulSoup
import pandas
# from time import sleep, time
# import random
# Create table fields
names = []
prices = []
year = []
body_type = []
milage = []
engine = []
hp = []
transmission = []
petrol_type = []
for i in range(1, 2):
# Make a get request
response = get('https://www.autotrader.co.uk/car-search?sort=sponsored&seller-type=private&page=' + str(i))
# Pause the loop
# sleep(random.randint(4, 7))
# Create containers
html_soup = BeautifulSoup(response.text, 'html.parser')
ad_containers = html_soup.find_all('h2', class_ = 'listing-title title-wrap')
price_containers = html_soup.find_all('section', class_ = 'price-column')
for container in ad_containers:
name = container.find('a', class_ ="js-click-handler listing-fpa-link").text
names.append(name)
# Trying to loop through the key specs list and assigned each 'li' to a different field in the table
lis = []
list_container = container.find('ul', class_='listing-key-specs')
for li in list_container.find('li'):
lis.append(li)
year.append(lis[0])
body_type.append(lis[1])
milage.append(lis[2])
engine.append(lis[3])
hp.append(lis[4])
transmission.append(lis[5])
petrol_type.append(lis[6])
lis = [] # Clearing dictionary to get ready for next set of data
for pricteainers in price_containers:
price = pricteainers.find('div', class_ ='vehicle-price').text
prices.append(price)
test_df = pandas.DataFrame({'Title': names, 'Price': prices, 'Year': year, 'Body Type': body_type, 'Mileage': milage, 'Engine Size': engine, 'HP': hp, 'Transmission': transmission, 'Petrol Type': petrol_type})
print(test_df.info())
# test_df.to_csv('Autotrader_test.csv')
Upvotes: 1
Views: 1096
Reputation: 1357
I followed the advice from David in the other answer's comment area.
Code:
from requests import get
from bs4 import BeautifulSoup
import pandas as pd
pd.set_option('display.width', 1000)
pd.set_option('display.height', 1000)
pd.set_option('display.max_rows', 500)
pd.set_option('display.max_columns', 500)
names = []
prices = []
year = []
body_type = []
milage = []
engine = []
hp = []
transmission = []
petrol_type = []
for i in range(1, 2):
response = get('https://www.autotrader.co.uk/car-search?sort=sponsored&seller-type=private&page=' + str(i))
html_soup = BeautifulSoup(response.text, 'html.parser')
outer = html_soup.find_all('article', class_='search-listing')
for inner in outer:
lis = []
names.append(inner.find_all('a', class_ ="js-click-handler listing-fpa-link")[1].text)
prices.append(inner.find('div', class_='vehicle-price').text)
for li in inner.find_all('ul', class_='listing-key-specs'):
for i in li.find_all('li')[-7:]:
lis.append(i.text)
year.append(lis[0])
body_type.append(lis[1])
milage.append(lis[2])
engine.append(lis[3])
hp.append(lis[4])
transmission.append(lis[5])
petrol_type.append(lis[6])
test_df = pd.DataFrame.from_dict({'Title': names, 'Price': prices, 'Year': year, 'Body Type': body_type, 'Mileage': milage, 'Engine Size': engine, 'HP': hp, 'Transmission': transmission, 'Petrol Type': petrol_type}, orient='index')
print(test_df.transpose())
Output:
Title Price Year Body Type Mileage Engine Size HP Transmission Petrol Type
0 Citroen C3 1.4 HDi Exclusive 5dr £500 2002 (52 reg) Hatchback 123,065 miles 1.4L 70bhp Manual Diesel
1 Volvo V40 1.6 XS 5dr £585 1999 (V reg) Estate 125,000 miles 1.6L 109bhp Manual Petrol
2 Toyota Yaris 1.3 VVT-i 16v GLS 3dr £700 2000 (W reg) Hatchback 94,000 miles 1.3L 85bhp Automatic Petrol
3 MG Zt-T 2.5 190 + 5dr £750 2002 (52 reg) Estate 95,000 miles 2.5L 188bhp Manual Petrol
4 Volkswagen Golf 1.9 SDI E 5dr £795 2001 (51 reg) Hatchback 153,000 miles 1.9L 68bhp Manual Diesel
5 Volkswagen Polo 1.9 SDI Twist 5dr £820 2005 (05 reg) Hatchback 106,116 miles 1.9L 64bhp Manual Diesel
6 Volkswagen Polo 1.4 S 3dr (a/c) £850 2002 (02 reg) Hatchback 125,640 miles 1.4L 75bhp Manual Petrol
7 KIA Picanto 1.1 LX 5dr £990 2005 (05 reg) Hatchback 109,000 miles 1.1L 64bhp Manual Petrol
8 Vauxhall Corsa 1.2 i 16v SXi 3dr £995 2004 (54 reg) Hatchback 81,114 miles 1.2L 74bhp Manual Petrol
9 Volkswagen Beetle 1.6 3dr £995 2003 (53 reg) Hatchback 128,000 miles 1.6L 102bhp Manual Petrol
Upvotes: 1
Reputation: 552
The ul
is not a child of the h2
. It's a sibling.
So you will need to make a separate selection because it's not part of the ad_containers
.
Upvotes: 1