Reputation: 31
The aim of the script is to visit a website, that then generates a list of links for all the products using selenium through get_attribute.
Using requests, I visit each of these newly generated links to visit each product. Then I attempt to scrape using BeautifulSoup storing in different characteristic variables.
My issue is I believe that some of the products that I am trying to scrape does not have the category I am trying to scrape for, however, I believe most of them do. Is there a way to return something like "N/A" for products that don't have the stored characteristic I am scraping?
Here is my code:
import time
import csv
from selenium import webdriver
import selenium.webdriver.chrome.service as service
import requests
from bs4 import BeautifulSoup
all_product = []
url = "https://www.vatainc.com/infusion.html?limit=all"
service = service.Service('/Users/Jonathan/Downloads/chromedriver.exe')
service.start()
capabilities = {'chrome.binary': '/Google/Chrome/Application/chrome.exe'}
driver = webdriver.Remote(service.service_url, capabilities)
driver.get(url)
time.sleep(2)
links = [x.get_attribute('href') for x in driver.find_elements_by_xpath("//*[contains(@class, 'product-name')]/a")]
for link in links:
html = requests.get(link).text
soup = BeautifulSoup(html, "html.parser")
products = soup.findAll("html")
for product in products:
name = product.find("div", {"class": "product-name"}).text.strip('\n\r\t": ')
manufacturing_SKU = product.find("span", {"class": "i-sku"}).text.strip('\n\r\t": ')
manufacturer = product.find("p", {"class": "manufacturer"}).text.strip('\n\r\t": ')
description = product.find("div", {"class": "std description"}).text.strip('\n\r\t": ')
included_products = product.find("div", {"class": "included_parts"}).text.strip('\n\r\t": ')
price = product.find("span", {"class": "price"}).text.strip('\n\r\t": ')
all_product.append([name, manufacturing_SKU, manufacturer, description, included_products, price])
print(all_product)
Here is my error code:
AttributeError Traceback (most recent call last)
<ipython-input-25-36feec64789d> in <module>()
34 manufacturer = product.find("p", {"class": "manufacturer"}).text.strip('\n\r\t": ')
35 description = product.find("div", {"class": "std description"}).text.strip('\n\r\t": ')
---> 36 included_products = product.find("div", {"class": "included_parts"}).text.strip('\n\r\t": ')
37 price = product.find("span", {"class": "price"}).text.strip('\n\r\t": ')
38 all_product.append([name, manufacturing_SKU, manufacturer, description, included_products, label, price])
AttributeError: 'NoneType' object has no attribute 'text'
Upvotes: 0
Views: 43
Reputation: 3035
The find()
method on your BeautifulSoup
object is returning None
when it can't find a DOM element that matches your query. Specifically, on that included_products
line, it can't find a div
with class included_parts
.
You can do something like this to get an included_products
value of None
in this case:
def find_with_class(soup, tag_type, class_name):
elements = soup.find(tag_type, {'class': class_name})
if elements:
return elements.text.strip('\n\r\t": ')
else:
return None
included_products = find_with_class(product, 'div', 'included_parts')
Upvotes: 0