Reputation: 870
I have the following HTML:
<div id="infoTable">
<h4>
User
</h4>
<table>
<tbody>
<tr>
<td class="name">
<a href="/userpage/123">BillyBob12345</a>
</td>
</tr>
<tr>
<td class="name">
<a href="/userpage/124">JimBob43</a>
</td>
</tr>
</tbody>
</table>
<h4>
Super User
</h4>
<table>
<tbody>
<tr>
<td class="name">
<a href="/userpage/112">CookieMonster</a>
</td>
</tr>
</tbody>
</table>
</div>
Basically, I am looking to get two lists:
Users = [{"BillyBob12345" : "123"}, {"JimBob43" : "124"}]
SuperUsers = [{"CookieMonster" : "112"}]
I am currently using Python 2.7 with BeautifulSoup4 and I am able to find all of the users, but I can't split them up into their respectful groups.
Upvotes: 0
Views: 614
Reputation: 870
I was actually able to extract the info using this:
if (BS.find('div').find('h4',text="User")):
FindUsers = BS.find('div').find('h4', text="User").findNext('table').find('td', {"class" : "name"}).findAll('a')
Users = [{u.text.strip() : u['href'].split('/')[2]} for u in FindUsers ]
Upvotes: 0
Reputation: 6508
If you happen to know that they are in that order, you could just use a list comprehension to create those lists of dictionaries, parsing the "userpage" number using .split('/')
:
firstTable = soup.findAll('table')[0]
users = [{a.text : a['href'].split('/')[2]} for a in firstTable.findAll('a')]
secondTable = soup.findAll('table')[1]
superUsers = [{a.text : a['href'].split('/')[2]} for a in secondTable.findAll('a')]
>>> users
[{'BillyBob12345': '123'}, {'JimBob43': '124'}]
>>> superUsers
[{'CookieMonster': '112'}]
If you want to access the name "Users" to then use it into a dictionary, you can use:
>>> firstTable.previousSibling.previousSibling
<h4>
User
</h4>
Upvotes: 0