Reputation: 65
(https://github.com/buriy/python-readability)
I am struggling using this library and I can't find any documentation for it. (Is there any?)
There are some kind of useable pieces calling help(Document) but there is still something wrong.
My code so far:
from readability.readability import Document
import requests
url = 'http://www.somepage.com'
html = requests.get(url, verify=False).content
readable_article = Document(html, negative_keywords='test_keyword').summary()
with open('test.html', 'w', encoding='utf-8') as test_file:
test_file.write(readable_article)
According to the help(Document) output, it should be possible to use a list for the input of the negative_keywords.
readable_article = Document(html, negative_keywords=['test_keyword1', 'test-keyword2').summary()
Gives me a bunch of errors I don't understand:
Traceback (most recent call last): File "/usr/lib/python3.4/site-packages/readability/readability.py", line 163, in summary candidates = self.score_paragraphs() File "/usr/lib/python3.4/site-packages/readability/readability.py", line 300, in score_paragraphs candidates[parent_node] = self.score_node(parent_node) File "/usr/lib/python3.4/site-packages/readability/readability.py", line 360, in score_node content_score = self.class_weight(elem) File "/usr/lib/python3.4/site-packages/readability/readability.py", line 348, in class_weight if self.negative_keywords and self.negative_keywords.search(feature): AttributeError: 'list' object has no attribute 'search' Traceback (most recent call last): File "/usr/lib/python3.4/site-packages/readability/readability.py", line 163, in summary candidates = self.score_paragraphs() File "/usr/lib/python3.4/site-packages/readability/readability.py", line 300, in score_paragraphs candidates[parent_node] = self.score_node(parent_node) File "/usr/lib/python3.4/site-packages/readability/readability.py", line 360, in score_node content_score = self.class_weight(elem) File "/usr/lib/python3.4/site-packages/readability/readability.py", line 348, in class_weight if self.negative_keywords and self.negative_keywords.search(feature): AttributeError: 'list' object has no attribute 'search'
Could some one give me please a hint on the error or how to deal with it?
Upvotes: 1
Views: 1254
Reputation: 17532
There's an error in the library code. If you look at compile_pattern
:
def compile_pattern(elements):
if not elements:
return None
elif isinstance(elements, (list, tuple)):
return list(elements)
elif isinstance(elements, regexp_type):
return elements
else:
# assume string or string like object
elements = elements.split(',')
return re.compile(u'|'.join([re.escape(x.lower()) for x in elements]), re.U)
You can see that it only returns a regex if the elements
is not None, not a list or tuple, and not a regular expression.
Later on, though, it assumes that self.negative_keywords
is a regular expression. So, I suggest you input your list as a string in the form of "test_keyword1,test_keyword2"
. This will make sure that compile_pattern
returns a regular expression which should fix the error.
Upvotes: 1