Reputation: 1279

Extract content within a tag with BeautifulSoup

I'd like to extract the content Hello world. Please note that there are multiples <table> and similar <td colspan="2"> on the page as well:

<table border="0" cellspacing="2" width="800">
  <tr>
    <td colspan="2"><b>Name: </b>Hello world</td>
  </tr>
  <tr>
...

I tried the following:

hello = soup.find(text='Name: ')
hello.findPreviousSiblings

But it returned nothing.

In addition, I'm also having problem with the following extracting the My home address:

<td><b>Address:</b></td>

<td>My home address</td>

I'm also using the same method to search for the text="Address: " but how do I navigate down to the next line and extract the content of <td>?

Upvotes: 54

Answers (4)

AnalyticsBuilder

Reputation: 4261

Use .next instead:

>>> s = '<table border="0" cellspacing="2" width="800"><tr><td colspan="2"><b>Name: </b>Hello world</td></tr><tr>'
>>> soup = BeautifulSoup(s)
>>> hello = soup.find(text='Name: ')
>>> hello.next
u'Hello world'

.next and .previous lets you move through the document elements in the order they were processed by the parser, while sibling methods work with the parse tree.

Upvotes: 21

Babatunde Mustapha

Reputation: 2663

Use the below code to get extract text and content from html tags with python beautifulSoup

s = '<td>Example information</td>' # your raw html
soup =  BeautifulSoup(s) #parse html with BeautifulSoup
td = soup.find('td') #tag of interest <td>Example information</td>
td.text #Example information # clean text from html

Upvotes: 8

Олег Клишин

Reputation: 59

from bs4 import BeautifulSoup, Tag

def get_tag_html(tag: Tag):
    return ''.join([i.decode() if type(i) is Tag else i for i in tag.contents])

Upvotes: 3

solvingPuzzles

Reputation: 8889

The contents operator works well for extracting text from <tag>text</tag> .

<td>My home address</td> example:

s = '<td>My home address</td>'
soup =  BeautifulSoup(s)
td = soup.find('td') #<td>My home address</td>
td.contents #My home address

<td><b>Address:</b></td> example:

s = '<td><b>Address:</b></td>'
soup =  BeautifulSoup(s)
td = soup.find('td').find('b') #<b>Address:</b>
td.contents #Address:

Upvotes: 54

Extract content within a tag with BeautifulSoup

Answers (4)

Related Questions