Reputation: 1279
I'd like to extract the content Hello world
. Please note that there are multiples <table>
and similar <td colspan="2">
on the page as well:
<table border="0" cellspacing="2" width="800">
<tr>
<td colspan="2"><b>Name: </b>Hello world</td>
</tr>
<tr>
...
I tried the following:
hello = soup.find(text='Name: ')
hello.findPreviousSiblings
But it returned nothing.
In addition, I'm also having problem with the following extracting the My home address
:
<td><b>Address:</b></td>
<td>My home address</td>
I'm also using the same method to search for the text="Address: "
but how do I navigate down to the next line and extract the content of <td>
?
Upvotes: 54
Views: 118693
Reputation: 4261
Use .next
instead:
>>> s = '<table border="0" cellspacing="2" width="800"><tr><td colspan="2"><b>Name: </b>Hello world</td></tr><tr>'
>>> soup = BeautifulSoup(s)
>>> hello = soup.find(text='Name: ')
>>> hello.next
u'Hello world'
.next
and .previous
lets you move through the document elements in the order they were processed by the parser, while sibling methods work with the parse tree.
Upvotes: 21
Reputation: 2663
Use the below code to get extract text and content from html tags with python beautifulSoup
s = '<td>Example information</td>' # your raw html
soup = BeautifulSoup(s) #parse html with BeautifulSoup
td = soup.find('td') #tag of interest <td>Example information</td>
td.text #Example information # clean text from html
Upvotes: 8
Reputation: 59
from bs4 import BeautifulSoup, Tag
def get_tag_html(tag: Tag):
return ''.join([i.decode() if type(i) is Tag else i for i in tag.contents])
Upvotes: 3
Reputation: 8889
The contents
operator works well for extracting text
from <tag>text</tag>
.
<td>My home address</td>
example:
s = '<td>My home address</td>'
soup = BeautifulSoup(s)
td = soup.find('td') #<td>My home address</td>
td.contents #My home address
<td><b>Address:</b></td>
example:
s = '<td><b>Address:</b></td>'
soup = BeautifulSoup(s)
td = soup.find('td').find('b') #<b>Address:</b>
td.contents #Address:
Upvotes: 54