skywalker
skywalker

Reputation: 79

Python: Build DataFrame from parts of JSON response

I am trying to develop an application to retrieve stock prices (in JSON) and then do some analysis on them. My problem is with getting the JSON response into a pandas DataFrame where I can work. Here is my code:

'''
References
http://stackoverflow.com/questions/6862770/python-3-let-json-object-  accept-bytes-or-let-urlopen-output-strings
'''
import json
import pandas as pd
from urllib.request import urlopen

#set API call
url = "https://www.quandl.com/api/v3/datasets/WIKI/AAPL.json?start_date=2017-01-01&end_date=2017-01-31"

#make call and receive response
response = urlopen(url).read().decode('utf8')
dataresponse = json.loads(response)

#check incoming
#print(dataresponse)

df = pd.read_json(dataresponse)

print(df)

The application errors at df = pd.read_json... with error TypeError: Expected String or Unicode.

So I reckon this is the first hurdle.

The second is getting where I need to. The JSON response contains only two arrays I am interested in, column_names and data. How do I extract only these two and put into a pandas DataFrame?

Upvotes: 1

Views: 2377

Answers (1)

IanS
IanS

Reputation: 16251

To answer your first question, pd.read_json takes a JSON string directly, so you should be doing this:

pd.read_json(response)

But instead, considering how the data is structured, it's best to first convert the JSON string to a dictionary containing the data:

d = json.loads(response)

Then simply build the dataframe from d['dataset']['data'] and d['dataset']['column_names']:

pd.DataFrame(data=d['dataset']['data'], columns=d['dataset']['column_names'])

Upvotes: 5

Related Questions