Emil
Emil

Reputation: 25

Syntax error in web scraping program using beautifulsoup, requests and regex

As part of 'Automate the boring stuff' I am trying to learn how to code in python. One of the exercises is to create a web scraper using beautifulsoup and requests.

I decided to try amazons stock price instead of a price of a product on amazon. I managed to get it to work, but the output was several lines.

So wanted to use regex to just return the stock price and not the loss/win and time stamp as well.

It however kept giving me syntax errors one line 1, I've tried removing the Regex part to return it to just the bs4 and requests part going back to the start but that still gave me the syntax error (I am using VSC to avoid parenthesis errors).

Where am I going wrong? and depending on how wrong, how would the correct code look like?

My code currently looks like this:

import bs4, requests, re

def extractedStockPrice(price):
    stockPriceRegex = re.compile(r'''
    [0-9]?
    ,?
    [0-9]+
    /.
    [0-9]*
    ''', re.VERBOSE)
    return stockPriceRegex.search(price)

def getStockPrice(stockUrl):
    res = requests.get(stockUrl)
    res.raise_for_status()

    soup = bs4.BeautifulSoup(res.text, 'html.parser')
    elems = soup.select('#quote-header-info > div.My\(6px\).Pos\(r\).smartphone_Mt\(6px\)')
    return elems[0].text.strip()

stockPrice = extractedStockPrice(getStockPrice('https://finance.yahoo.com/quote/AMZN?p=AMZN&.tsrc=fin-srch'))
print('The price is ' + stockPrice)

Upvotes: 1

Views: 176

Answers (1)

Jesper - jtk.eth
Jesper - jtk.eth

Reputation: 7484

The issue seems to be with you regex expression - in the function extractedStockPrice. It does not match the price expression and the search returns "None" which causes the type error mentioned in the comment.

The price string variable, when it reaches the regex part looks like this (example):

'2,042.76-0.24 (-0.01%)At close: 4:00PM EDT'

You can use a regex syntax checker to confirm your regex code: https://www.regexpal.com/ (post the above string as "Test String" and your regex code as "Regular Expression).

Looks like your forward slash should be backwards slash. Also, you need to extract the match once found - you can do this with group(0) (see this and search for re.search: https://docs.python.org/3/library/re.html).

The below code should work (run with Python 3.7):

import bs4, requests, re

def extractedStockPrice(price):
     # fixes here:
     #   1) use backslash "\" instead of "/".
     #   2) use ".group(0)" to extract match.
     stockPriceRegex = re.compile(r'''[0-9]?,?[0-9]+\.[0-9]*''', re.VERBOSE)
     return stockPriceRegex.search(price).group(0)

def getStockPrice(stockUrl):
    res = requests.get(stockUrl)
    res.raise_for_status()

    soup = bs4.BeautifulSoup(res.text, 'html.parser')
    elems = soup.select('#quote-header-info > div.My\(6px\).Pos\(r\).smartphone_Mt\(6px\)')
    return elems[0].text.strip()

stockPrice = extractedStockPrice(getStockPrice('https://finance.yahoo.com/quote/AMZN?p=AMZN&.tsrc=fin-srch'))

print('The price is ' + stockPrice)

Result: "The price is 2,042.76".

Upvotes: 1

Related Questions