Finn
Finn

Reputation: 13

TypeError: can't concat bytes to str

am trying to get an example working as a tutorial. The objective of this code is to read a url, pull a stock value for a list of companies and print the value. What I have, trying to piece together some advise other posts, and the original code (Python 3) is:

import urllib.request
import re

Symbols = ['aapl', 'spy' , 'goog' , 'nflx']
i = 0
while i < len(Symbols):
    Yahoo='http://finance.yahoo.com/q?s=' + Symbols[i]
    htmlfile = urllib.request.urlopen(Yahoo)
    htmltext = htmlfile.read()
    pattern= re.compile(b'<span id="yfs_l84_'+ Symbols[i] +'">(.+?)</span>')
    price= re.findall(pattern, htmltext)
    print('The price of' + (Symbols[i]) + ' is ' + str(price))
    i+=1

I understand that the output from the html.read() is in bytes, so I need to convert my regex pattern into 'bytes' as well (using 'b') the error message I then have is :

Traceback (most recent call last):
  File "C:/Users/User/Documents/Raspberry Pi/Python/web scraper/web_scraper_v2.1.py", line 11, in
    price= re.findall(pattern, htmltext)
  File "C:\Python33\lib\re.py", line 201, in findall
    return _compile(pattern, flags).findall(string)
TypeError: can't use a string pattern on a bytes-like object

I suspect this is syntax related, but cant work it out any suggestions?

Upvotes: 0

Views: 4613

Answers (1)

Nafiul Islam
Nafiul Islam

Reputation: 82470

Here you go:

import urllib.request
import re

Symbols = ['aapl', 'spy', 'goog', 'nflx']
i = 0
while i < len(Symbols):
    Yahoo = 'http://finance.yahoo.com/q?s=' + Symbols[i]
    htmlfile = urllib.request.urlopen(Yahoo)
    htmltext = htmlfile.read()
    # Changed the string below so that we can resolve the problems with putting the
    # symbol value in the right place. str.format has been used instead
    # Also, the string has been made into a normal string again
    pattern = re.compile('<span id="yfs_l84_{symbol}">(.+?)</span>'.format(symbol=Symbols[i]))
    # Here htmltext is turned into a string so that we can compare, without the type-error
    price = re.findall(pattern, str(htmltext))
    print('The price of' + (Symbols[i]) + ' is ' + str(price))
    i += 1

Upvotes: 2

Related Questions