Reputation: 21
I have been trying to format a nested json file to a pandas dataframe but i may have missing something,
How can extract the timeseries onto a pandas dataframe? I have been struggling trying to extract all the numbering but if succesful I ended with some of metadata in a dataaframe
Please help!
{
"Meta Data": {
"1. Information": "Intraday (60min) prices and volumes",
"2. Symbol": "BHP.AX",
"3. Last Refreshed": "2018-02-09 00:00:00",
"4. Interval": "60min",
"5. Output Size": "Compact",
"6. Time Zone": "US/Eastern"
},
"Time Series (60min)": {
"2018-02-09 00:00:00": {
"1. open": "29.1100",
"2. high": "29.1950",
"3. low": "29.1000",
"4. close": "29.1300",
"5. volume": "788213"
},
"2018-02-08 23:00:00": {
"1. open": "29.0000",
"2. high": "29.2000",
"3. low": "29.0000",
"4. close": "29.1100",
"5. volume": "768704"
},
"2018-02-08 22:00:00": {
"1. open": "29.1000",
"2. high": "29.1000",
"3. low": "28.9600",
"4. close": "29.0000",
"5. volume": "830235"
},
"2018-02-08 21:00:00": {
"1. open": "29.0850",
"2. high": "29.2250",
"3. low": "29.0750",
"4. close": "29.1050",
"5. volume": "803142"
},
"2018-02-08 20:00:00": {
"1. open": "28.9200",
"2. high": "29.1500",
"3. low": "28.8900",
"4. close": "29.0900",
"5. volume": "1231131"
}
}
}
any Ideas ?
Upvotes: 1
Views: 657
Reputation: 402323
You can use pd.DataFrame.from_dict
, specifying an orient
.
data = json.loads(json_data)
df = pd.DataFrame.from_dict(data['Time Series (60min)'], orient='index')
You can also cleanup your column headers if needed.
df.columns = df.columns.str.split('. ').str[1] # an optional step
df
open high low close volume
2018-02-08 20:00:00 28.9200 29.1500 28.8900 29.0900 1231131
2018-02-08 21:00:00 29.0850 29.2250 29.0750 29.1050 803142
2018-02-08 22:00:00 29.1000 29.1000 28.9600 29.0000 830235
2018-02-08 23:00:00 29.0000 29.2000 29.0000 29.1100 768704
2018-02-09 00:00:00 29.1100 29.1950 29.1000 29.1300 788213
Upvotes: 2
Reputation: 57033
If you need only the time series, just tell Pandas!
js = '''Your_JSON_String'''
jsdata = json.loads(js)
jsts = jsdata["Time Series (60min)"]
ts = pd.DataFrame(jsts).T
# 1. open 2. high 3. low 4. close 5. volume
# 2018-02-08 20:00:00 28.9200 29.1500 28.8900 29.0900 1231131
# 2018-02-08 21:00:00 29.0850 29.2250 29.0750 29.1050 803142
# 2018-02-08 22:00:00 29.1000 29.1000 28.9600 29.0000 830235
# 2018-02-08 23:00:00 29.0000 29.2000 29.0000 29.1100 768704
# 2018-02-09 00:00:00 29.1100 29.1950 29.1000 29.1300 788213
Upvotes: 0