Reputation: 11
I am trying to extract the text from an htm file on my jupyter notebook. I first read the file using:
with open('Materials.htm') as file b:
file3=b.readlines()
file3=''.join(file3)
Then, I parse the file and use get_text().
Stock_page=BeautifulSoup(file3, 'lxml')
for movers_name in Stock_page('td',style="text-align:left;"):
movers=list()
movers.append(movers_name.get_text())
print(movers)
This code does print the list but also give the
AttributeError:'NoneType' object has no attribute 'get_text'
I want to put this in a for loop to read different files but with the error it doesn't work. Does anyone know what am I doing wrong? Than you!
Upvotes: 0
Views: 103
Reputation: 2115
You should pass the file object just as it is to BeautifulSoup and parse it as HTML.
with open('Materials.htm','r') as f:
Stock_page = BeautifulSoup(f, "html.parser")
Upvotes: 2