Hans Muller
Hans Muller

Reputation: 1

How to fix Key Error in Python using Beautiful soup?

I am having trouble with the following code. I want to extract of every product the title, the URL, the image URL and product number. And extract the data into an Excel spreadsheet.

import requests
from bs4 import BeautifulSoup
import pandas as pd

url = 'https://b2b.pmsinternational.com/search/?q=&submit=Search+Product+Name'

response = requests.get(url)
soup = BeautifulSoup(response.content, 'html.parser')

items = soup.find_all('section', attrs={'class': 'products'})

rows = []
columns = ['product title', 'Item Number', 'Product URL','Image URL']

for item in items:
    product_url = item.a['href']
    product_image_url = item.a['src']
    product_title = item.a['title']
    product_number = item.a['ref']
    
    row = [product_title, product_number, product_url, product_image_url]
    rows.append(row)
    
df = pd.DataFrame(rows, columns=columns)
df.to_excel('PMS International Products.xlsx', index=False)
    
print('File Saved')

Error:

Traceback (most recent call last):
  File "C:/Users/hansm/PycharmProjects/Scraping/main.py", line 17, in <module>
    product_image_url = item.a['src']
  File "C:\Users\hansm\PycharmProjects\Scraping\venv\lib\site-packages\bs4\element.py", line 1406, in __getitem__
    return self.attrs[key]
KeyError: 'src'

Upvotes: 0

Views: 891

Answers (2)

Ram
Ram

Reputation: 4779

You are getting a Key Error because you are looking for the keys - src, title and ref that are not present in the<a> tags.

  • The product URL is present inside the <a> tags href - You are getting that right.
  • The product image URL is present inside the img tags src attribute and not in a tags src (There is no src attribute in a tags).
  • The product title is present inside a span tag with class title and not inside the a tags title (There is no title attribute in a tags).
  • The Item No is present inside a span tag with class ref and not inside the a(There is no ref attribute in a tags).

Also your code only gets the data of the first product of the web-page. I suppose you need the data of all the products.

Below code will get the data of all products.

import requests
from bs4 import BeautifulSoup


url = 'https://b2b.pmsinternational.com/search/?q=&submit=Search+Product+Name'

response = requests.get(url)
soup = BeautifulSoup(response.content, 'html.parser')

section_products = soup.find('section', attrs={'class': 'products'})
uls = section_products.findAll('ul')


rows = []
columns = ['product title', 'Item Number', 'Product URL','Image URL']

for ul in uls:
    for item in ul.findAll('li'):
        product_url = item.a['href']
        print(product_url)
        product_image_url = item.img['src']
        print(product_image_url)
        product_title = item.find('span', class_='title').text.strip()
        print(product_title)
        product_number = item.find('span', class_='ref').text.strip()
        print(product_number)

Upvotes: 0

MendelG
MendelG

Reputation: 20088

The src attribute is within an img tag that's within the a. You need to first find() the img tag and then access the src attribute.

Instead of:

product_image_url = item.a['src']

use:

product_image_url = item.a.find('img')['src']

the same goes for product_title. Instead of:

product_title = item.a['title']

use:

product_title = item.a.find('img')["title"]

But regarding product_number, I don't see a ref attribute, hence

product_number = item.a['ref']

causes an error.

Upvotes: 1

Related Questions