Reputation: 39
Here's a part of my HTML page that I parse into variable using Beautiful Soup. I need to extract some of the text values and insert them into table later on. I need the name of the player, team and points.
I can get the first player name, and the second one using next_sibling but couldn't iterate through the whole page.
<h3>NBA Player Points</h3>
<br>
0089, Thu Jan 16 03:00:00 CET 2020, DEN/CHA-Murray J. (DEN)
<ul>
<li>Player Points [Under : 1.0, Over : 1.0, OU : 0.0]</li>
<li>Player Points [Under : 1.0, Over : 1.0, OU : 0.0]</li>
<li>Player Points [Under : 1.85, Over : 1.85, OU : 18.5]</li>
<li>Player Points [Under : 1.0, Over : 1.0, OU : 0.0]</li>
<li>Index Rating [Under : 1.0, Over : 1.0, OU : 0.0]</li>
<li>Player Assists [Under : 1.0, Over : 1.0, OU : 0.0]</li>
<li>Player Rebounds [Under : 1.0, Over : 1.0, OU : 0.0]</li>
</ul>
0761, Thu Jan 16 03:00:00 CET 2020, DEN/CHA-Rozier T. (CHA)
<ul>
<li>Player Points [Under : 1.0, Over : 1.0, OU : 0.0]</li>
<li>Player Points [Under : 1.0, Over : 1.0, OU : 0.0]</li>
<li>Player Points [Under : 1.75, Over : 1.95, OU : 18.5]</li>
<li>Player Points [Under : 1.0, Over : 1.0, OU : 0.0]</li>
<li>Index Rating [Under : 1.0, Over : 1.0, OU : 0.0]</li>
<li>Player Assists [Under : 1.0, Over : 1.0, OU : 0.0]</li>
<li>Player Rebounds [Under : 1.0, Over : 1.0, OU : 0.0]</li>
</ul>
1491, Thu Jan 16 03:00:00 CET 2020, DEN/CHA-Grant J. (DEN)
<ul>
<li>Player Points [Under : 1.0, Over : 1.0, OU : 0.0]</li>
<li>Player Points [Under : 1.0, Over : 1.0, OU : 0.0]</li>
<li>Player Points [Under : 1.85, Over : 1.85, OU : 13.5]</li>
<li>Player Points [Under : 1.0, Over : 1.0, OU : 0.0]</li>
<li>Index Rating [Under : 1.0, Over : 1.0, OU : 0.0]</li>
<li>Player Assists [Under : 1.0, Over : 1.0, OU : 0.0]</li>
<li>Player Rebounds [Under : 1.0, Over : 1.0, OU : 0.0]</li>
</ul>
Here's what I'd like to get from this HTML:
Player: Murray J.
Team: DEN
Player Points: 18.5
Player: Rozier T.
Team: CHA
Player Points: 18.5
Player: Grant J.
Team: DEN
Player Points: 13.5
Any ideas?
Upvotes: 1
Views: 343
Reputation: 24930
Not the most elegant code, but it should get you there. The main string manipulation tool used here is the partition()
method which splits a string into 3 sub-strings around a separator. From these are then stripped off unnecessary characters using the strip()
and replace()
methods.
from bs4 import BeautifulSoup as bs
players = """[your html above]"""
soup = bs(players,'lxml')
names = soup.select('ul')
for name in names:
dat = name.previous.strip().partition('-')[2]
print('Name:',dat.partition('. ')[0]+'.')
print('Team:',dat.partition('. ')[2].replace('(','').replace(')',''))
print('Player Points:',name.select('li')[2].text.partition(', OU : ')[2].replace(']',''))
Output:
Name: Murray J.
Team: DEN
Player Points: 18.5
Name: Rozier T.
Team: CHA
Player Points: 18.5
Name: Grant J.
Team: DEN
Player Points: 13.5
Upvotes: 1