marcovc
marcovc

Reputation: 21

Extracting from a nested json file to a pandas dataframe

I have been trying to format a nested json file to a pandas dataframe but i may have missing something,

How can extract the timeseries onto a pandas dataframe? I have been struggling trying to extract all the numbering but if succesful I ended with some of metadata in a dataaframe

Please help!

{
"Meta Data": {
    "1. Information": "Intraday (60min) prices and volumes",
    "2. Symbol": "BHP.AX",
    "3. Last Refreshed": "2018-02-09 00:00:00",
    "4. Interval": "60min",
    "5. Output Size": "Compact",
    "6. Time Zone": "US/Eastern"
},
"Time Series (60min)": {
    "2018-02-09 00:00:00": {
        "1. open": "29.1100",
        "2. high": "29.1950",
        "3. low": "29.1000",
        "4. close": "29.1300",
        "5. volume": "788213"
    },
    "2018-02-08 23:00:00": {
        "1. open": "29.0000",
        "2. high": "29.2000",
        "3. low": "29.0000",
        "4. close": "29.1100",
        "5. volume": "768704"
    },
    "2018-02-08 22:00:00": {
        "1. open": "29.1000",
        "2. high": "29.1000",
        "3. low": "28.9600",
        "4. close": "29.0000",
        "5. volume": "830235"
    },
    "2018-02-08 21:00:00": {
        "1. open": "29.0850",
        "2. high": "29.2250",
        "3. low": "29.0750",
        "4. close": "29.1050",
        "5. volume": "803142"
    },
    "2018-02-08 20:00:00": {
        "1. open": "28.9200",
        "2. high": "29.1500",
        "3. low": "28.8900",
        "4. close": "29.0900",
        "5. volume": "1231131"
    }
  }
}

any Ideas ?

Upvotes: 1

Views: 657

Answers (2)

cs95
cs95

Reputation: 402323

You can use pd.DataFrame.from_dict, specifying an orient.

data = json.loads(json_data)
df = pd.DataFrame.from_dict(data['Time Series (60min)'], orient='index')

You can also cleanup your column headers if needed.

df.columns = df.columns.str.split('. ').str[1] # an optional step

df

                    open     high      low    close   volume
2018-02-08 20:00:00  28.9200  29.1500  28.8900  29.0900  1231131
2018-02-08 21:00:00  29.0850  29.2250  29.0750  29.1050   803142
2018-02-08 22:00:00  29.1000  29.1000  28.9600  29.0000   830235
2018-02-08 23:00:00  29.0000  29.2000  29.0000  29.1100   768704
2018-02-09 00:00:00  29.1100  29.1950  29.1000  29.1300   788213

Upvotes: 2

DYZ
DYZ

Reputation: 57033

If you need only the time series, just tell Pandas!

js = '''Your_JSON_String'''
jsdata = json.loads(js)
jsts = jsdata["Time Series (60min)"]
ts = pd.DataFrame(jsts).T
#                     1. open  2. high   3. low 4. close 5. volume
# 2018-02-08 20:00:00  28.9200  29.1500  28.8900  29.0900   1231131
# 2018-02-08 21:00:00  29.0850  29.2250  29.0750  29.1050    803142
# 2018-02-08 22:00:00  29.1000  29.1000  28.9600  29.0000    830235
# 2018-02-08 23:00:00  29.0000  29.2000  29.0000  29.1100    768704
# 2018-02-09 00:00:00  29.1100  29.1950  29.1000  29.1300    788213

Upvotes: 0

Related Questions