paro
paro

Reputation: 11

Extracting text from html file gives attribute error

I am trying to extract the text from an htm file on my jupyter notebook. I first read the file using: with open('Materials.htm') as file b: file3=b.readlines() file3=''.join(file3)

Then, I parse the file and use get_text().

Stock_page=BeautifulSoup(file3, 'lxml')
   for movers_name in Stock_page('td',style="text-align:left;"):
       movers=list()
       movers.append(movers_name.get_text())
       print(movers)

This code does print the list but also give the

AttributeError:'NoneType' object has no attribute 'get_text'

I want to put this in a for loop to read different files but with the error it doesn't work. Does anyone know what am I doing wrong? Than you!

Upvotes: 0

Views: 103

Answers (1)

prithajnath
prithajnath

Reputation: 2115

You should pass the file object just as it is to BeautifulSoup and parse it as HTML.

with open('Materials.htm','r') as f:
    Stock_page = BeautifulSoup(f, "html.parser")

Upvotes: 2

Related Questions