chitown88
chitown88

Reputation: 28565

python beautifulsoup - pull a list/dictionary

I'm still leaning how to utilize beautifulsoup. I've managed to use tags and what not to pull the data from Depth Chart table at https://fantasydata.com/nfl-stats/team-details/CHI

But now I'm try to pull the Full Roster table. I can't quite seem to figure out the tags for that. I do notice in the source though that the info is in a list with dictionaries, as seen:

vm.Roster = [{"PlayerId":16236,"Name":"Cody Parkey","Team":"CHI","Position":"K","FantasyPosition":"K","Height":"6\u00270\"","Weight":189,"Number":1,"CurrentStatus":"Healthy","CurrentStatusCol

...

What's an elegant way to pull that Full Roster table? My thought was if I could just grab that list/dictionary, I could just convert to a dataframe. But not sure how to grab that, or if there is a better way to do that to put that table in a dataframe in python.

Upvotes: 0

Views: 336

Answers (1)

jpw
jpw

Reputation: 44871

One possible solution is to use a regular expression to extract the raw JSON object which then can be loaded using the json library.

from bs4 import BeautifulSoup
from urllib.request import urlopen
import re
import json

html_page = urlopen("https://fantasydata.com/nfl-stats/team-details/CHI")
soup = BeautifulSoup(html_page, "html.parser")
raw_data = re.search(r"vm.Roster = (\[.*\])", soup.text).group(1)
data = json.loads(raw_data)

print(data[0]["Name"])  # Cody Parkey

It should be noted that scraping data from that particular site in this fashion most likely violates their terms of service and might even be illegal in some jurisdictions.

Upvotes: 1

Related Questions