Reputation: 23
I am using Python and Beautiful Soup to scrape the newegg website and grab product pricing, name and shipping costs. However, when I run the program, the output sends back only the first product entry from the website. Can anybody help me with what I'm doing wrong?
# import beautiful soup 4 and use urllib to import urlopen
import bs4
from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup
# url where we will grab the product data
my_url = 'http://www.newegg.com/Video-Cards-Video-Devices/Category/ID-38?
Tpk=graphics%20card'
# open connection and grab the URL page information, read it, then close it
uClient = uReq(my_url)
page_html = uClient.read()
uClient.close()
# parse html from the page
page_soup = soup(page_html, "html.parser")
# find each product within the item-container class
containers = page_soup.findAll("div",{"class":"item-container"})
# write a file named products.csv with the data returned
filename = "products.csv"
f = open(filename, "w")
# create headers for products
headers = "price, product_name, shipping\n"
f.write("")
# define containers based on location on webpage and their DOM elements
for container in containers:
price_container = container.findAll("li", {"class":"price-current"})
price = price_container[0].text.strip("|")
title_container = container.findAll("a", {"class":"item-title"})
product_name = title_container[0].text
shipping_container = container.findAll("li",{"class":"price-ship"})
shipping = shipping_container[0].text.strip()
# print each product with the brand, product name and shipping cost
print("price: " + price)
print("product name: " + product_name)
print("shipping: " + shipping)
# when writing each section, add a comma, replace comma with pipe,
# add new line after shipping
f.write(price + "," + product_name.replace(",", "|") + "," + shipping +
"\n")
f.close()
Upvotes: 2
Views: 682
Reputation: 41
The print and write statements should be put inside the for block.
# define containers based on location on webpage and their DOM elements for container in containers:
For container in containers:
price_container = container.findAll("li", {"class":"price-current"})
price = price_container[0].text.strip("|")
title_container = container.findAll("a", {"class":"item-title"})
product_name = title_container[0].text
shipping_container = container.findAll("li" {"class":"price-ship"})
shipping = shipping_container[0].text.strip()
# print each product with the brand, product name and
shipping cost
print("price: " + price)
print("product name: " + product_name)
print("shipping: " + shipping)
# when writing each section, add a comma, replace comma with pipe,
# add new line after shipping
f.write(price + "," + product_name.replace(",", "|") + "," + shipping + "\n")
f.close()
Upvotes: 1
Reputation: 1508
You need either another for loop around your call to f.write(), or to write within your first for loop.
You are only writing one 'product' to the file because that line of code is only executing once.
Easiest solution is move
f.write(price + "," + product_name.replace(",", "|") + "," + shipping +
"\n")
to right after
shipping = shipping_container[0].text.strip()
be sure to indent to match the rest of your for loop contents.
Do yourself a favor and read the python docs. https://docs.python.org/3/
Upvotes: 0
Reputation: 71471
You can try this:
from bs4 import BeautifulSoup as soup
import requests
import re
s = soup(requests.get('http://www.newegg.com/Video-Cards-Video-Devices/Category/ID-38?', proxies={'http':'67.63.33.7:80'}).text, 'lxml')
new_data = [filter(lambda x:len(x) > 1, [re.sub('\s{4}', '', re.sub('[\n\r]+', '', b.text)) for b in i.find_all(re.compile('a|li'), {'class':re.compile('item-title|price-current|price-ship')})]) for i in s.find_all('div', {'class':"item-container"})]
Output:
[[u'GIGABYTE AORUS GeForce GTX 1080 Ti DirectX 12 GV-N108TAORUS X-11GD 11GB ...', u'$1,039.99\xa0\u2013'], [u'EVGA GeForce GTX 1050 SC GAMING, 02G-P4-6152-KR, 2GB GDDR5, DX12 OSD Support (PXOC)', u'|$149.99\xa0(9 Offers)\u2013', u'(9 Offers)', u'$4.99 Shipping'], [u'ASUS GeForce GTX 1050 PH-GTX1050-2G Video Card', u'|$139.99\xa0(6 Offers)\u2013', u'(6 Offers)', u'$4.99 Shipping'], [u'ZOTAC GeForce GTX 1050 DirectX 12 ZT-P10500A-10L Video Card', u'|$134.99\xa0(4 Offers)\u2013', u'(4 Offers)', u'$4.99 Shipping'], [u'MSI GeForce GTX 1050 DirectX 12 GTX 1050 2GT LP Video Cards', u'|$139.99\xa0(2 Offers)\u2013', u'(2 Offers)', u'$4.99 Shipping'], [u'XFX Radeon RX 560 DirectX 12 RX-560P4SFG5 Video Card', u'|$179.99\xa0\u2013', u'$4.99 Shipping'], [u'GIGABYTE Radeon RX 550 DirectX 12 GV-RX550D5-2GD Video Card', u'|$109.99\xa0(2 Offers)\u2013', u'(2 Offers)', u'$3.99 Shipping'], [u'ZOTAC GeForce GT 1030 2GB GDDR5 64-bit PCIe 3.0 DirectX 12 HDCP Ready Low Profile Video Card ZT-P10300A-10L', u'|$89.99\xa0(4 Offers)\u2013', u'(4 Offers)', u'$3.99 Shipping'], [u'MSI Radeon R7 250 DirectX 12 R7 250 2GD3 OC Video Card', u'(2 Offers)', u'$3.99 Shipping'], [u'EVGA GeForce GTX 1050 SSC GAMING ACX 3.0, 02G-P4-6154-KR, 2GB GDDR5, DX12 OSD Support (PXOC)', u'(4 Offers)', u'Free Shipping'], [u'ASUS GeForce GT 1030 2GB GDDR5 HDMI DVI Graphics Card (GT1030-2G-CSM)', u'(12 Offers)', u'Free Shipping'], [u'XFX Radeon RX 560 DirectX 12 RX-560P2SFG5 Video Card', u'|$139.99\xa0\u2013', u'$4.99 Shipping']]
Upvotes: 0