user8028
user8028

Reputation: 463

Get contents of div by id with BeautifulSoup

I am using python2.7.6, urllib2, and BeautifulSoup

to extract html from a website and store in a variable.

How can I show just the html contents of a div with an id by using beautifulsoup?

<div id='theDiv'>
<p>div content</p>
<p>div stuff</p>
<p>div thing</p>

would be

<p>div content</p>
<p>div stuff</p>
<p>div thing</p>

Upvotes: 9

Views: 36341

Answers (2)

Antony Hatchkins
Antony Hatchkins

Reputation: 33994

Since version 4.0.1 there's a function decode_contents():

>>> soup = BeautifulSoup("""
<div id='theDiv'>
<p>div content</p>
<p>div stuff</p>
<p>div thing</p>
""")

>>> print(soup.div.decode_contents())

<p>div content</p>
<p>div stuff</p>
<p>div thing</p>

More details in a solution to this question: https://stackoverflow.com/a/18602241/237105

Upvotes: 1

alecxe
alecxe

Reputation: 473873

Join the elements of div tag's .contents:

from bs4 import BeautifulSoup

data = """
<div id='theDiv'>
    <p>div content</p>
    <p>div stuff</p>
    <p>div thing</p>
</div>
"""

soup = BeautifulSoup(data)
div = soup.find('div', id='theDiv')
print ''.join(map(str, div.contents))

Prints:

<p>div content</p>
<p>div stuff</p>
<p>div thing</p>

Upvotes: 18

Related Questions