Reputation: 33
i've been going at this for hours now! i keep getting an error, all i'm trying to do is scrape the name of the product, brand, price, and shipping price, i have successfully scraped all, only issue is when i try and scrape the price and get it to loop through every one of the items on the webpage! i have a separate file in which i successfully scraped the price! This is my code trying to put everything together, and this is the error i get! please help!
# coding=utf-8
from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup
url = 'https://www.newegg.com/Product/ProductList.aspx?Submit=Property&N=100007709%2050001419%2050001315%2050001402%2050001312%2050001669%2050012150%2050001561%2050001314%2050001471%20600566292%20600566291%20600565504%20601201888%20601204369%20601210955%20601203793%204814%20601296707&IsNodeId=1&cm_sp=Cat_video-Cards_1-_-Visnav-_-Gaming-Video-Cards_1'
# This grabs the webpage and downloads it!
uClient = uReq(url)
# This is so i can read everything out of the url!
page_html = uClient.read()
uClient.close()
page_soup = soup(page_html, "html.parser")
# Grabs each product!
containers = page_soup.findAll("div", {"class": {"item-container", "item-action"}})
# set up the loop to get the brand of the item!
for container in containers:
brand_container = container.findAll("a", {"class":"item-brand"})
brand = container.div.div.a.img["title"]
title_container = container.findAll("a", {"class":"item-title"})
product_name = title_container[0].text
price_container = container.findAll("li", {"class":"price-current"})
price = container.strong.text
shipping_container = page_soup.findAll("li", {"class": "price-ship"})
shipping = shipping_container[0].text.strip()
print("Product_name: " + product_name)
print("Brand: " + brand)
print("Price: " + price)
print("Shipping: " + shipping)
Upvotes: 0
Views: 387
Reputation: 393
The AttributeError
is raised because a tag does not have the subtag you are looking for (e.g. container
does not have a .div
). The cause is because of this line:
containers = page_soup.findAll("div", {"class": {"item-container", "item-action"}})
You are making containers
all item-container
divs and item-action
divs. The item-action
divs are not the containers you want to iterate through. If you change that line to:
containers = page_soup.findAll("div", {"class": {"item-container"}})
then it should parse correctly.
Finally you should change
brand = container.div.div.a.img["title"]
to:
brand = brand_container.img["title"]
Upvotes: 2