Reputation: 1678
When l call the following function
def get_words(self):
blocks = self.soup.find_all("block", {"blockType": lambda x: x not in ('Separator', 'SeparatorsBox')})
wrds_blcks = []
for i, block in enumerate(blocks):
if block['blockType'] == 'Table':
rslt = self._get_words_from_block_table(block)
else:
rslt = self._get_words_from_block_text(block)
rslt = self._cleanup_word(rslt)
if rslt != [[]] and rslt != []:
wrds_blcks.append(rslt)
return wrds_blcks
l get the following error
in get_words
blocks = self.soup.find_all("block", {"blockType": lambda x: x not in ('Separator', 'SeparatorsBox')})
AttributeError: 'AbbyExtractor' object has no attribute 'soup'
referring to the first line :
blocks = self.soup.find_all("block", {"blockType": lambda x: x not in ('Separator', 'SeparatorsBox')})
What's wrong ?
Upvotes: 1
Views: 370
Reputation: 5110
You need to make the soup
first. Pass the html
code retrieved from the webpage as an argument to the get_words
method. And make the soup
. Then do your tasks.
def get_words(self, html):
self.soup = BeautifulSoup(html,"lxml")
blocks = self.soup.find_all("block", {"blockType": lambda x: x not in ('Separator', 'SeparatorsBox')})
wrds_blcks = []
for i, block in enumerate(blocks):
if block['blockType'] == 'Table':
rslt = self._get_words_from_block_table(block)
else:
rslt = self._get_words_from_block_text(block)
rslt = self._cleanup_word(rslt)
if rslt != [[]] and rslt != []:
wrds_blcks.append(rslt)
return wrds_blcks
Upvotes: 1