Reputation: 9348
A file contains HTML codes like below (the words 'Registration' and 'Flying' are fixed in the following paragraphs):
<TR>
<TD class=CAT2 width="10%">Registration</TD>
<TD class=CAT1 width="20%">02 Mar 2006</TD></TR>
<TR>
<TD class=CAT2 width="10%">Flying</TD>
<TD class=CAT1 width="20%">24 Jun 2005</TD></TR>
I want to extract them and put as:
Registration 02 Mar 2006
Flying 24 Jun 2005
I am using the BeautifulSoup find_next_sibling however it returns nothing. What’s went wrong?
from bs4 import BeautifulSoup
url = r"C:\example.html"
page = open(url)
soup = BeautifulSoup(page.read())
aa = soup.find_next_sibling(text='Registration')
print aa
Upvotes: 0
Views: 104
Reputation: 12092
This line of code:
aa = soup.find_next_sibling(text='Registration')
is not returning a node in the HTML as you are expecting it would. Instead it is returning a NoneType
. What you want to do instead is, find the element with text='Registration'
get it's parent and get the parent's next sibling.
aa = soup.find(text='Registration')
par = aa.parent
print par.next_sibling.string
You could also achieve your output as:
soup = BeautifulSoup(page.read())
row_1 = soup.find('tr')
td = row_1.find('td')
string_1 = td.string + ' ' + td.next_sibling.string #Registration 02 Mar 2006
row_2 = row_1.next_sibling
td = row_2.find('td')
string_2 = td.string + ' ' + td.next_sibling.string #Flying 24 Jun 2005
Upvotes: 0