Reputation: 875
Basically I am trying scrape all HTML tags from a list of HTML files. When I am trying to do this I am getting the error:
TypeError: expected string or bytes-like object.
So is there a way to iterate over a list with regex?
Here is the code I am using:
import pymssql
import re
conn = pymssql.connect(
host='xxx',
port=xxx,
user='xxx',
password='xxx',
database='xxxx'
)
cursor = conn.cursor()
cursor.execute('SELECT 'column' FROM 'table'')
text = cursor.fetchall()
conn.close()
raw = []
raw.append(text)
str(raw)
x = re.sub('<[^<]+?>', '', raw)
Upvotes: 1
Views: 928
Reputation: 315
Check out the BeautifulSoup package. It's an HTML parser which you can treat like a normal python dictionary.
Upvotes: 0
Reputation: 61930
The error:
TypeError: expected string or bytes-like object.
refers to the fact that raw
points to a list
object, to point it to a string. You need to do:
raw = str(raw) # instead of just str(raw)
but, if text
is indeed a string why not just:
x = re.sub('<[^<]+?>', '', text)
For more details see the documentation on str, the quote below is from there:
Return a str version of object. See str() for details.
Upvotes: 1