Reputation: 129
I want to get the content of a div with a class of "gt-read" and within the div have another div that has a different class. Below is the script code snippet:
Scripts :
data = """
<div class='gt-read'>
<!-- no need -->
<!-- some no need -->
<b>Bold text</b> - some text here <br/>
lorem ipsum here <br/>
<strong> Author Name</strong>
<div class='some-class'>
<script>
#...
Js script here
#...
</script>
</div>
</div>
"""
soup = BeautifulSoup(data, 'lxml')
get_class = soup.find("div", {"class" : "detail_text"})
print 'notices', notices.get_text()
print 'notices', notices
and I want results like this:
<b>Bold text</b> - some text here <br/>
lorem ipsum here <br/>
<strong> Author Name</strong>
Kindly help.
Upvotes: 0
Views: 1377
Reputation: 46779
The following should display what you need:
from bs4 import BeautifulSoup, Comment
data = """
<div class='gt-read'>
<!-- no need -->
<!-- some no need -->
<b>Bold text</b> - some text here <br/>
lorem ipsum here <br/>
<strong> Author Name</strong>
<div class='some-class'>
<script>
#...
Js script here
#...
</script>
</div>
</div>
"""
soup = BeautifulSoup(data, 'lxml')
get_class = soup.find("div", {"class" : "gt-read"})
comments = get_class.find_all(text=lambda text:isinstance(text, Comment))
[comment.extract() for comment in comments]
get_class.find("div").extract()
text = get_class.encode_contents().strip()
print text
Giving you the following output:
<b>Bold text</b> - some text here <br/>
lorem ipsum here <br/>
<strong> Author Name</strong>
This gets the gt-read
class, extracts all comments and the div tag, and returns the remaining markup.
Upvotes: 2