Reputation: 569
I would like to use this web scrape to create a pandas dataframe that way I can export the data to excel. Is anyone familiar with this? I have seen different methods online and on this site but have been unable to successfully duplicate the results with this scrape.
Here is the code so far:
import requests
source = requests.get("https://api.lineups.com/nba/fetch/lineups/gateway").json()
for team in source['data']:
print("\n%s players\n" % team['home_route'].capitalize())
for player in team['home_players']:
print(player['name'])
print("\n%s players\n" % team['away_route'].capitalize())
for player in team['away_players']:
print(player['name'])
This site seems useful but the examples are different:
https://www.tutorialspoint.com/python_pandas/python_pandas_dataframe.htm
Here is another example from stackoverflow.com:
Loading web scraping results into Pandas DataFrame
I am new to coding/scraping so any help will greatly appreciated. Thanks in advance for your time and effort!
Upvotes: 2
Views: 5361
Reputation: 7416
I have added a solution to have a dataframe
teamwise, I hope this helps. Updated
code
import requests
source = requests.get("https://api.lineups.com/nba/fetch/lineups/gateway").json()
players = []
teams = []
for team in source['data']:
print("\n%s players\n" % team['home_route'].capitalize())
teams.append(team['home_route'].capitalize())
teams.append(team['away_route'].capitalize())
temp = []
temp1 = []
for player in team['home_players']:
print(player['name'])
temp.append(player['name'])
print("\n%s players\n" % team['away_route'].capitalize())
for player in team['away_players']:
print(player['name'])
temp1.append(player['name'])
players.append(temp)
players.append(temp1)
import pandas as pd
df = pd.DataFrame(columns=teams)
for i in range(0, len(df.columns)):
df[df.columns[i]] = players[i]
df
In order to export to excel, you can do
df.to_excel('result.xlsx')
Upvotes: 6
Reputation: 8816
you can try like below..
>>> import pandas as pd
>>> import json
>>> import requests
>>> source = requests.get("https://api.lineups.com/nba/fetch/lineups/gateway").json()
>>> df = pd.DataFrame.from_dict(source) # directly use source as itself is a dict
Now you can take the dataframe into csv format by df.to_csv
as follows:
>>> df.to_csv("nba_play.csv")
Below are Just your columns which you can process for your data as desired..
>>> df.columns
Index(['bottom_header', 'bottom_paragraph', 'data', 'heading',
'intro_paragraph', 'page_title', 'twitter_link'],
dtype='object')
However as Charles said, you can use json_normalize
which will give you better view of data in a tabular form..
>>> from pandas.io.json import json_normalize
>>> json_normalize(df['data']).head()
away_bets.key away_bets.moneyline away_bets.over_under \
0 ATL 500 o232.0
1 POR 165 o217.0
2 SAC 320 o225.0
3 BKN 110 o216.0
4 TOR -140 o221.0
away_bets.over_under_moneyline away_bets.spread \
0 -115 11.0
1 -115 4.5
2 -105 9.0
3 -105 2.0
4 -105 -2.0
away_bets.spread_moneyline away_bets.total \
0 -110 121.50
1 -105 110.75
2 -115 117.00
3 -110 109.00
4 -115 109.50
away_injuries \
0 [{'name': 'J. Collins', 'profile_url': '/nba/p...
1 [{'name': 'M. Harkless', 'profile_url': '/nba/...
2 [{'name': 'K. Koufos', 'profile_url': '/nba/pl...
3 [{'name': 'T. Graham', 'profile_url': '/nba/pl...
4 [{'name': 'O. Anunoby', 'profile_url': '/nba/p...
away_players away_route \
0 [{'draftkings_projection': 30.04, 'yahoo_posit... atlanta-hawks
1 [{'draftkings_projection': 47.33, 'yahoo_posit... portland-trail-blazers
2 [{'draftkings_projection': 28.88, 'yahoo_posit... sacramento-kings
3 [{'draftkings_projection': 37.02, 'yahoo_posit... brooklyn-nets
4 [{'draftkings_projection': 45.2, 'yahoo_positi... toronto-raptors
... nav.matchup_season nav.matchup_time \
0 ... 2019 2018-10-29T23:00:00+00:00
1 ... 2019 2018-10-29T23:00:00+00:00
2 ... 2019 2018-10-29T23:30:00+00:00
3 ... 2019 2018-10-29T23:30:00+00:00
4 ... 2019 2018-10-30T00:00:00+00:00
nav.status.away_team_score nav.status.home_team_score nav.status.minutes \
0 None None None
1 None None None
2 None None None
3 None None None
4 None None None
nav.status.quarter_integer nav.status.seconds nav.status.status \
0 None Scheduled
1 None Scheduled
2 None Scheduled
3 None Scheduled
4 None Scheduled
nav.updated order
0 2018-10-29T17:51:05+00:00 0
1 2018-10-29T17:51:05+00:00 1
2 2018-10-29T17:51:05+00:00 2
3 2018-10-29T17:51:05+00:00 3
4 2018-10-29T17:51:05+00:00 4
[5 rows x 383 columns]
Hope, this will help
Upvotes: 1
Reputation: 4265
Python requests
conveniently renders the json as a dict
so you can just use the dict in a pd.DataFrame
constructor.
import pandas as pd
df = pd.DataFrame([dict1, dict2, dict3])
# Do your data processing here
df.to_csv("myfile.csv")
Pandas also has pd.io.json
with helpers like json_normalize
so once your data is in a dataframe you can process nested json in to tabular data, and so on.
Upvotes: 1