Issues while scraping html with bs4

Question

Trying to scrape the following html using the python bs4 script below. Keep getting an error (listed below). No idea whats causing it? If someone could help me figure out how to get it working then that would be great!

£24.73

Python BS4 script:

prices = {

    "GLDAG_MAPLE":        {"url":    "https://www.gold.co.uk/silver-coins/candian-silver-maple-coins/1oz-canadian-maple-silver-coin-2020/",
                           "trader": "Gold.co.uk",
                           "metal":  "Silver",
                           "type":   "Maple"},
    "BBPAG_MAPLE":        {"url": "https://www.bullionbypost.co.uk/silver-coins/canadian-maple-1oz-silver-coin/2019-1oz-canadian-maple-silver-coin/",
                           "trader": "Bullion By Post",
                           "metal":  "Silver",
                           "type":   "Maple"},
    "ATKAG_BRITANNIA":    {"url": "https://atkinsonsbullion.com/silver/silver-coins/1oz-silver-coins/2020-uk-britannia-1oz-silver-coin",
                           "trader": "Atkinsons Bullion",
                           "metal":  "Silver",
                           "type":   "Britannia"},
}

response = requests.get(
    'https://www.bullionbypost.co.uk/silver-price/silver-price-per-gram/')
soup = BeautifulSoup(response.text, 'html.parser')
AG_GRAM_SPOT = soup.find(
    'span', {'name': 'current_price_field'}).get_text()

# Convert to float
AG_GRAM_SPOT = float(re.sub(r"[^0-9\.]", "", AG_GRAM_SPOT))
# No need for another lookup
AG_OUNCE_SPOT = AG_GRAM_SPOT * 31.1035

for coin in prices:
    response = requests.get(prices[coin]["url"])
    soup = BeautifulSoup(response.text, 'html.parser')

    try:
        text_price = soup.find(
            'td', {'id': 'price-inc-vat-per-unit-1'}).get_text()         # BullionByPost
    except:
        text_price = soup.find(
            'td', {'id': 'total-price-inc-vat-1'}).get_text()            # Gold.co.uk
    else:
        text_price = soup.find(
            'span', {'class': 'prodInfoPriceVat'}).get_text()         # Issues here!Line 70

    # Grab the number
    prices[coin]["price"] = float(re.sub(r"[^0-9\.]", "", text_price))

Keep getting this error: How do I fix it?

Traceback (most recent call last):
  File "scraper.py", line 70, in 
    text_price = soup.find(
AttributeError: 'NoneType' object has no attribute 'get_text'

How can I get this working?

Andrej Kesely · Accepted Answer

No need to use exceptions here, just use if..else and test if found element is not None.

For example:

import re
import requests
from bs4 import BeautifulSoup

prices = {

    "GLDAG_MAPLE":        {"url":    "https://www.gold.co.uk/silver-coins/candian-silver-maple-coins/1oz-canadian-maple-silver-coin-2020/",
                           "trader": "Gold.co.uk",
                           "metal":  "Silver",
                           "type":   "Maple"},
    "BBPAG_MAPLE":        {"url": "https://www.bullionbypost.co.uk/silver-coins/canadian-maple-1oz-silver-coin/2019-1oz-canadian-maple-silver-coin/",
                           "trader": "Bullion By Post",
                           "metal":  "Silver",
                           "type":   "Maple"},
    "ATKAG_BRITANNIA":    {"url": "https://atkinsonsbullion.com/silver/silver-coins/1oz-silver-coins/2020-uk-britannia-1oz-silver-coin",
                           "trader": "Atkinsons Bullion",
                           "metal":  "Silver",
                           "type":   "Britannia"},
}

response = requests.get(
    'https://www.bullionbypost.co.uk/silver-price/silver-price-per-gram/')
soup = BeautifulSoup(response.text, 'html.parser')
AG_GRAM_SPOT = soup.find(
    'span', {'name': 'current_price_field'}).get_text()

# Convert to float
AG_GRAM_SPOT = float(re.sub(r"[^0-9\.]", "", AG_GRAM_SPOT))
# No need for another lookup
AG_OUNCE_SPOT = AG_GRAM_SPOT * 31.1035

for coin in prices:
    print('url=', prices[coin]["url"])
    response = requests.get(prices[coin]["url"])
    soup = BeautifulSoup(response.text, 'html.parser')

    text_price = soup.find(
        'td', {'id': 'price-inc-vat-per-unit-1'})        # BullionByPost

    if not text_price:
        text_price = soup.find(
            'td', {'id': 'total-price-inc-vat-1'})       # Gold.co.uk

    if not text_price:
        text_price = soup.find(
            'span', {'class': 'prodInfoPriceVat'})       # atkinsonsbullion.com

    if not text_price:
        print('Error, unable to fint price for url=', prices[coin]["url"])
        prices[coin]["price"] = float('nan')
        continue

    text_price = text_price.get_text(strip=True)

    # Grab the number
    prices[coin]["price"] = float(re.sub(r"[^0-9\.]", "", text_price))
    print('price=', prices[coin]["price"])

Prints:

url= https://www.gold.co.uk/silver-coins/candian-silver-maple-coins/1oz-canadian-maple-silver-coin-2020/
price= 31.32
url= https://www.bullionbypost.co.uk/silver-coins/canadian-maple-1oz-silver-coin/2019-1oz-canadian-maple-silver-coin/
price= 26.88
url= https://atkinsonsbullion.com/silver/silver-coins/1oz-silver-coins/2020-uk-britannia-1oz-silver-coin
price= 24.73

Issues while scraping html with bs4

Answers (1)

Related Questions