Reputation: 621
I am trying to extract specific 'dd' element from the website using Python
headers = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) '\
'AppleWebKit/537.36 (KHTML, like Gecko) '\
'Chrome/75.0.3770.80 Safari/537.36'}
url = "https://www.ranger5g.com/forum/threads/pre-collision-assist.3239"
page = requests.get(url, headers=headers)
soup = BeautifulSoup(page.text, 'html.parser')
vehicle=[]
for i in soup.findAll("div", class_="message-userExtras"):
for item in soup.find_all("dd")[::-1]:
vehicle.append(item.get_text())
print(vehicle)
I am trying to extract only vehicle list from the url and my output should be as follows
2019 Ford Ranger XLT FX4
2019 Ford Ranger Lariat FX4, 1973 Mercury Capri
Tahoe/Tundra/Fusion
2019 Ford Ranger Lariat - Saber; 2014 GMC Terrain
But my result is not what I expect it to be
Upvotes: 2
Views: 363
Reputation: 33384
Use regular expression re and search the dt
tag with text Vehicle
and then find the next dd
tag.
import re
from bs4 import BeautifulSoup
import requests
headers = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) '\
'AppleWebKit/537.36 (KHTML, like Gecko) '\
'Chrome/75.0.3770.80 Safari/537.36'}
url = "https://www.ranger5g.com/forum/threads/pre-collision-assist.3239"
page = requests.get(url, headers=headers)
soup = BeautifulSoup(page.text, 'html.parser')
for item in soup.find_all("div",class_='message-userExtras'):
print(item.find('dt',text=re.compile("Vehicle")).find_next('dd').text.strip())
Output:
2019 Ford Ranger XLT FX4
2019 Ford Ranger Lariat FX4, 1973 Mercury Capri
Tahoe/Tundra/Fusion
2019 Ford Ranger Lariat - Saber; 2014 GMC Terrain
2019 Ford Ranger Lariat FX4, 1973 Mercury Capri
2019 Ranger Lariat - 2019 Honda CRV Touring
2019 Ford Ranger XLT FX4
2019 Ford Ranger Lariat FX4, 1973 Mercury Capri
2019 Ranger Lariat SuperCab
2019 Ranger Lariat
Ranger Lariat
2019 Ford Ranger Lariat
Ranger Lariat
Ranger Lariat
2019 Ranger XLT 301A SuperCrew 4X4 2015 Ecoboost Mustang 50 Year Appereance Package convertible
Upvotes: 2