Leo Torres
Leo Torres

Reputation: 690

Beautiful soup to extract key value pairs from data-op-info

Code below does not fail but it is not complete. From this point I am trying to only get all the fullgame values into a dataframe.

import json
from bs4 import BeautifulSoup
import urllib.request

source = urllib.request.urlopen('https://www.oddsshark.com/nfl/odds').read()
soup = BeautifulSoup(source, 'html.parser')

results = soup.find_all(class_ = "op-item op-spread op-opening")

for result in (results):
    print(json.loads(result['data-op-info']).items())

I used print at the end as I was trying to extract line value only and see it.

Note there is a similar question on this site but the solution only works for one div. It will fail if variable has multiple divs.
How to parse information between {} on web page using Beautifulsoup

Upvotes: 0

Views: 502

Answers (1)

Jonathan Leon
Jonathan Leon

Reputation: 5648

You were almost there. See where I have the list comprehension to captures the results then use json_normalize()

import json
from bs4 import BeautifulSoup
import urllib.request

source = urllib.request.urlopen('https://www.oddsshark.com/nfl/odds').read()
soup = BeautifulSoup(source, 'html.parser')

results = soup.find_all(class_ = "op-item op-spread op-opening")

rlist = [json.loads(result['data-op-info']) for result in (results)]
pd.json_normalize(rlist)

   fullgame firsthalf secondhalf firstquarter secondquarter thirdquarter fourthquarter
0      -4.5      -2.5       -1.5         -0.5          -0.5         -0.5          -0.5
1      +4.5      +2.5       +1.5         +0.5          +0.5         +0.5          +0.5
2        +7        +4       +3.5           +3            +3         +2.5            +2
3        -7        -4       -3.5           -3            -3         -2.5            -2
4        -3        -3       -2.5         -0.5            -2         -0.5          -0.5
5        +3        +3       +2.5         +0.5            +2         +0.5          +0.5
6        +3      +2.5       +0.5         +0.5          +0.5         +0.5          +0.5
7        -3      -2.5       -0.5         -0.5          -0.5         -0.5          -0.5
8        -3      -0.5       -0.5         -0.5          -0.5         -0.5          -0.5
9        +3      +0.5       +0.5         +0.5          +0.5         +0.5          +0.5
10       -3      -2.5         -1         -0.5            -1         -0.5          -0.5
11       +3      +2.5         +1         +0.5            +1         +0.5          +0.5
12       -1      +0.5       -0.5         +0.5          -0.5         -0.5          -0.5
13       +1      -0.5       +0.5         -0.5          +0.5         +0.5          +0.5
14     +2.5      +3.5         +3         +0.5          +2.5         +0.5            +1
15     -2.5      -3.5         -3         -0.5          -2.5         -0.5            -1
16       +4        +3         +2         +0.5            +1         +0.5          +0.5
17       -4        -3         -2         -0.5            -1         -0.5          -0.5
18     -2.5      -0.5       -0.5         +0.5          -0.5         -0.5          -0.5
19     +2.5      +0.5       +0.5         -0.5          +0.5         +0.5          +0.5
20     -2.5      -1.5       -0.5         -0.5          -0.5         -0.5          -0.5
21     +2.5      +1.5       +0.5         +0.5          +0.5         +0.5          +0.5
22     +2.5      +1.5       +0.5         +0.5          +0.5         +0.5          +0.5
23     -2.5      -1.5       -0.5         -0.5          -0.5         -0.5          -0.5
24     +1.5      +1.5         Ev         +0.5          -0.5         -0.5          -0.5
25     -1.5      -1.5         Ev         -0.5          +0.5         +0.5          +0.5
26     +5.5        +3       +2.5         +0.5          +0.5         +0.5          +0.5
27     -5.5        -3       -2.5         -0.5          -0.5         -0.5          -0.5
28     -3.5      -0.5         Ev         -0.5          +0.5         +0.5          +0.5
29     +3.5      +0.5         Ev         +0.5          -0.5         -0.5          -0.5
30       -5
31       +5

Or, if you really just want one key from the dictionary:

rlist = [json.loads(result['data-op-info'])['fullgame'] for result in (results)]
pd.DataFrame({'fullgame': rlist})

   fullgame
0      -4.5
1      +4.5
2        +7
3        -7
4        -3
5        +3
6        +3
7        -3
8        -3
9        +3
10       -3
11       +3
12       -1
13       +1
14     +2.5
15     -2.5
16       +4
17       -4
18     -2.5
19     +2.5
20     -2.5
21     +2.5
22     +2.5
23     -2.5
24     +1.5
25     -1.5
26     +5.5
27     -5.5
28     -3.5
29     +3.5
30       -5
31       +5

Upvotes: 1

Related Questions