Reputation: 135
I have the following html code and i use beautiful soup to extract information. I want to get for example Relationship status: Relationship
<table class="box-content-list" cellspacing="0">
<tbody>
<tr class="first">
<td>
<strong>
Relationship status:
</strong>
Relationship
</td>
</tr>
<tr class="alt">
<td>
<strong>
Living:
</strong>
With partner
</td>
</tr>
I have created the following code:
xs = [x for x in soup.findAll('table', attrs = {'class':'box-content-list'})]
for x in xs:
#print x
sx = [s for s in x.findAll('tr',attrs={'class':'first'})]
for s in sx:
td_tabs = [td for td in s.findAll('td')]
for td in td_tabs:
title = td.findNext('strong')
#print str(td)
status = td.findNextSibling()
print title.string
print status
but the result i get is Relations status: and the print status is printing None. What i am doing wrong?
Upvotes: 2
Views: 5202
Reputation: 8215
There is a special method get_text
(or getText
in old BeautifulSoup versions) to get the content of intricated tags. With your example:
>>> example.td.get_text(' ', strip=True)
'Relationship status: Relationship'
The first parameter is the separator to use.
Upvotes: 3
Reputation: 1124348
First of all, there is no need for all the list comprehensions; yours do nothing but copy the results, you can safely do without them.
There is no next sibling in your column (there is only one <td>
tag), so it returns None
. You wanted to get the .next
attribute from the title (the <strong>
tag) instead:
for table in soup.findAll('table', attrs = {'class':'box-content-list'}):
for row in table.findAll('tr',attrs={'class':'first'}):
for col in row.findAll('td'):
title = col.strong
status = title.nextSibling
print title.text.strip(), status.strip()
which prints:
Relationship status: Relationship
for your example.
Upvotes: 1