Mega_maha
Mega_maha

Reputation: 17

How to find right "div" to web scrape with python

I cant seem to "inspect" the right function for beautiful soup to function. I am trying to follow these guides but I cant seem to get past this point.

https://www.youtube.com/watch?v=XQgXKtPSzUI&t=119s Web scraping with Python

I am trying to webscrape a website to compare four vehicles by Safety features, Maintenance cost, and Price point. I am using spyder (python 3.6)

import bs4
from urllib import urlopen as uReq
from bs4 import BeautifulSoup as soup

my_url = 'https://www.edmunds.com/car-comparisons/? 
veh1=401768437&veh2=401753723&veh3=401780798&veh4=401768504'

# opening up connection, grabbing the page
uClient = uReq(my_url)
page_html = uClient.read()
uClient.close()

# html parsing
page_soup = soup(page_html, "html.parser")

#grabs each product
containers = page_soup.findAll("div", {"class":"feature-field-value"})

filename = "car_comparison.csv"
f = open(filename, "W")

Headers = "Title, product_name, shipping\n"
f.write(headers)

for container in containers:
brand = container.div.div.a.img["title"]

title_container = container.findAll("a", {"class":"vehicle-title f16-nm f14 
lh22-nm lh18 pv2 sserif-2 tr-text minh90"})
product_name = title_container[0].text


shipping_container = container.findAll("li",{"class":"price-ship"})
shipping_container[0].text.strip()

print("Title: " + title)
print("product_name: " + product_name)
print("shipping: " + shipping)

f.write( brand + "," +product_name.replace(",", "|") + "," + shipping + 
"\n")

f.close()

#Criteria 1 
#safety = Warranty, Basic?
#Maintence Cost = Maintence
#Price = Base MSRP

I know I have to change quite a bit but right now I just want it to run without getting errors


runfile('C:/Users/st.s.mahathirath.ctr/.spyder-py3/temp.py', wdir='C:/Users/st.s.mahathirath.ctr/.spyder-py3') Traceback (most recent call last):

File "", line 1, in runfile('C:/Users/st.s.mahathirath.ctr/.spyder-py3/temp.py', wdir='C:/Users/st.s.mahathirath.ctr/.spyder-py3')

File "C:\ProgramData\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 705, in runfile execfile(filename, namespace)

File "C:\ProgramData\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 102, in execfile exec(compile(f.read(), filename, 'exec'), namespace)

File "C:/Users/st.s.mahathirath.ctr/.spyder-py3/temp.py", line 2, in from urllib import urlopen as uReq

ImportError: cannot import name 'urlopen'

Upvotes: 0

Views: 557

Answers (2)

Amitabh Das
Amitabh Das

Reputation: 403

Try the following code:

import bs4
from urllib import urlopen as uReq
from bs4 import BeautifulSoup as soup

my_url = 'https://www.caranddriver.com/car-comparison-tool?chromeIDs=404121,402727,403989,403148'

# opening up connection, grabbing the page
uClient = uReq(my_url)
page_html = uClient.read()
uClient.close()

# html parsing
page_soup = soup(page_html, "html.parser")
containers = page_soup.findAll("div", {
    "class": "w50p "
})

I think you are trying to fetch the car comparisons, IMHO this might not work, since

  • this won't work in the command line as the site will throw unsupported browser;
  • IMHO the div is not the element to look for, after comparison of the cars in the site (too many of them). Try the document.getElementsByTagName('cd-view-car-card') in the debugger console to see 4 items (where last item is the "Add Car" item). Inside this cd-view-car-card, there is a single div with 2 children, the second child (div) contains all the relevant information (as per the current site design).

Hope this helps

Upvotes: 1

JustLudo
JustLudo

Reputation: 1790

from urllib.request urlopen as uReq

This looks like a typo to me? Maybe you meant:

from urllib.request import urlopen as uReq

Upvotes: 0

Related Questions