Reputation: 308
This is an html code than I'm trying to parse with BeautifulSoup:
<table>
<tr>
<th width="100">menu1</th>
<td>
<ul class="classno1" style="margin-bottom:10;">
<li>Some data1</li>
<li>Foo1<a href="/link/to/bar1">Bar1</a></li>
... (amount of this tags isn't fixed)
</ul>
</td>
</tr>
<tr>
<th width="100">menu2</th>
<td>
<ul class="classno1" style="margin-bottom:10;">
<li>Some data2</li>
<li>Foo2<a href="/link/to/bar2">Bar2</a></li>
<li>Foo3<a href="/link/to/bar3">Bar3</a></li>
<li>Some data3</li>
... (amount of this tags isn't fixed too)
</ul>
</td>
</tr>
</table>
The output I would like to get is a dictionary like this:
DICT = {
'menu1': ['Some data1','Foo1 Bar1'],
'menu2': ['Some data2','Foo2 Bar2','Foo3 Bar3','Some data3'],
}
As I already mentioned in the code, amount of <li>
tags is not fixed. Additionally, there could be:
<table></table>
)
so e.g. it could looks just like this:
<table>
<tr>
<th width="100">menu1</th>
<td>
<ul class="classno1" style="margin-bottom:10;">
<li>Some data1</li>
<li>Foo1<a href="/link/to/bar1">Bar1</a></li>
... (amount of this tags isn't fixed)
</ul>
</td>
</tr>
</table>
I was trying to use this example but with no success. I think it's because of that <ul>
tags, I can't read proper data from table. Problem for me is also variable amount of menus
and <li>
tags.
So my question is how to parse this particular table to python dictionary?
I should mention that I already parsed some simple data with .text
attribute of BeautifulSoup handler so it would be nice if I could just keep it as is.
request = c.get('http://example.com/somepage.html)
soup = bs(request.text)
and this is always the first table of the page, so I can get it with:
table = soup.find_all('table')[0]
Thank you in advance for any help.
Upvotes: 1
Views: 3538
Reputation: 142651
html = """<table>
<tr>
<th width="100">menu1</th>
<td>
<ul class="classno1" style="margin-bottom:10;">
<li>Some data1</li>
<li>Foo1<a href="/link/to/bar1">Bar1</a></li>
</ul>
</td>
</tr>
<tr>
<th width="100">menu2</th>
<td>
<ul class="classno1" style="margin-bottom:10;">
<li>Some data2</li>
<li>Foo2<a href="/link/to/bar2">Bar2</a></li>
<li>Foo3<a href="/link/to/bar3">Bar3</a></li>
<li>Some data3</li>
</ul>
</td>
</tr>
</table>"""
import BeautifulSoup as bs
soup = bs.BeautifulSoup(html)
table = soup.findAll('table')[0]
results = {}
th = table.findChildren('th')#,text=['menu1','menu2'])
for x in th:
#print x
results_li = []
li = x.nextSibling.nextSibling.findChildren('li')
for y in li:
#print y.next
results_li.append(y.next)
results[x.next] = results_li
print results
.
{
u'menu2': [u'Some data2', u'Foo2', u'Foo3', u'Some data3'],
u'menu1': [u'Some data1', u'Foo1']
}
Upvotes: 1