Reputation: 1
I am creating a small web scraping program in Python that takes GPU information from newegg.com and notes down all of the prices.
As of now, I have not implemented the spreadsheet as every time I run it, I get one of 2 errors.
The code is below:
from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup
import numpy as np
myURL = "https://www.newegg.com/global/uk/Product/ProductList.aspx?Submit=ENE&N=-1&IsNodeId=1&Description=graphics%20card&bop=And&PageSize=96&order=BESTMATCH" # defining my url as a variable
uClient = uReq(myURL) #opening the connection
page_html = uClient.read() # getting html
uClient.close() # closing the client
page_soup = soup(page_html, "html.parser") # html parsing
containers = page_soup.findAll("div", {"class":"item-container"}) #get all
item containers/product
container = containers[0]
count = 0
for container in containers:
print(count)
brand = container.div.div.a.img["title"]# get the brand of the card
if brand == None:
print("N/A")
else:
print(brand)
title_container = container.findAll("a", {"class", "item-title"})
product_name = title_container[0].text # getting the product name
if product_name == None:
print("N/A")
else:
print(product_name)
price1 = container.find("div",{"class":"item-action"})
price1 = price1.ul
price2 = price1.find("li", {"class": "price-current"}).contents #defining the product price
if not price2:
print("N/A")
else:
print(price2[2])
print(price2[3].text)
print(price2[4].text)
print()
count+=1
The errors say the following:
Traceback (most recent call last): File "C:/Users/Ethan Price/Desktop/test.py", line 23, in brand = container.div.div.a.img["title"]# get the brand of the card TypeError: 'NoneType' object is not subscriptable
Traceback (most recent call last): File "C:/Users/Ethan Price/Desktop/test.py", line 43, in print(price2[2]) IndexError: list index out of range
In trying to fix it, I tried to turn the list into an array and tried changing the if statements.
Upvotes: 0
Views: 168
Reputation: 191748
First error, check the image and its title tag exists
brand = None
# might want to check there is even an anchor tag
_img = container.div.div.a.img
if _img:
brand = _img["title"]
Second, check the length of the price listings
If 2 <= len(price2) <= 5:
for p in price2[2:]
print(p)
Upvotes: 0
Reputation: 54213
Both error messages mean that some element you expect to see doesn't exist. The first is complaining that container.div.div.a.img
is None
when you try to subscript it (and None
s can't be subscripted, for obvious reasons). The other is complaining that the list price2
isn't as long as you think it is, so price2[2]
is out of range.
Upvotes: 2