Reputation: 22440
I've tried to parse the content within comment from the below snippet but it doesn't seem to work at all. How can I make it work? My intention is to get the text within p
tag and the output should be:
Hi there!!
Hi again!!
Script I've already tried with:
from bs4 import BeautifulSoup, Comment
content="""
<!-- comment --><a href="https://extratorrent.ag/"><p>Hi there!!</p></a>
<!-- comment1 --><a href="https://thepiratebay.se/"><p>Hi again!!</p></a>
"""
soup = BeautifulSoup(content, 'lxml')
for comment in soup.find_all(string=lambda text:isinstance(text,Comment)):
data = BeautifulSoup(comment.next_element,"lxml")
for item in data.select("p"):
print(item.text)
The error I'm having:
Traceback (most recent call last):
File "C:\AppData\Local\Programs\Python\Python35-32\Social.py", line 9, in <module>
data = BeautifulSoup(comment.next_element,"lxml")
File "C:\AppData\Local\Programs\Python\Python35-32\lib\site-packages\bs4\__init__.py", line 191, in __init__
markup = markup.read()
TypeError: 'NoneType' object is not callable
Upvotes: 1
Views: 95
Reputation: 402363
Switch to html.parser
, and then just access the p
tag inside.
The advantage of the html.parser
is that it does not add extra <html><body>...</body></html>
tags around your soup data. You can then just access the contents of the p
tag using comment.next_element.p.text
.
soup = BeautifulSoup(content, 'html.parser')
for comment in soup.find_all(string=lambda text: isinstance(text, Comment)):
print(comment.next_element.p.text)
Hi there!!
Hi again!!
Upvotes: 2