rakamakafo
rakamakafo

Reputation: 159

I am trying to convert deeply nested JSON into pandas dataframe

I am trying to convert a json returned from the API call into pandas dataframe. Ideally I would like to extract only 'Type','Name' and 'SUPPLY'.

I have tried multiple things, such as flatten(), json_normalize() and so on, but couldn't make it to work.

def get_cryptocompare_data():
    url = "https://min-api.cryptocompare.com/data/top/mktcapfull?limit=15&tsym=USD"
    data = requests.get(url)
    d = data.json()

I would like to convert it to pandas dataframe, with columns for 'Type','Name','SUPPLY'.

Upvotes: 0

Views: 1048

Answers (3)

sammywemmy
sammywemmy

Reputation: 28729

Jmespath could help here with nested paths - basic summary is if u encounter a list, represent it with a bracket([]), if it is a key, access it wih dot notation (.) :

import requests
url = "https://min-api.cryptocompare.com/data/top/mktcapfull?limit=15&tsym=USD"
data = requests.get(url).json()

#example : access the name key :
#path : Data -> list -> CoinInfo(dict) ->Name(dict)
#representation : Data[].CoinInfo.Name
#in words : Data, then list, then CoinInfo key, then Name key

import jmespath
expression = jmespath.compile('''{name:Data[].CoinInfo.Name,
                                  type:Data[].CoinInfo.Type,
                                  supply:Data[].RAW.USD.SUPPLY}''')
res = expression.search(data)

Read in data

pd.DataFrame(res)

    name    type    supply
0   BTC     1   1.833352e+07
1   ETH     1   1.105811e+08
2   XRP     1   9.999185e+10
3   GAPS    1   2.000000e+09
4   CRO     1   1.000000e+11
5   USDT    1   4.642367e+09
6   BCH     1   1.838416e+07
7   PLF     1   1.000000e+10
8   CTAG    1   4.000000e+09
9   LINK    1   1.000000e+09

Upvotes: 1

Rich Andrews
Rich Andrews

Reputation: 1680

The goal here is to flatten data.json() which is a dict returned by requests.

Flattening dictionaries is a reasonable complex topic because of the various types of values that can be found within. But here, the data feed is known.

Therefore, a simply flattening function can be found from various python recipes and functionally applied to the data obtained from the data feed.

import json
import itertools
import requests
import json
import pandas as pd

def get_cryptocompare_data():
    url = "https://min-api.cryptocompare.com/data/top/mktcapfull?limit=15&tsym=USD"
    response = requests.get(url)
    d = response.json()
    return d

# https://codereview.stackexchange.com/questions/21033/flatten-dictionary-in-python-functional-style
def flatten_dict(d):
    def items():
        for key, value in d.items():
            if isinstance(value, dict):
                for subkey, subvalue in flatten_dict(value).items():
                    yield key + "." + subkey, subvalue
            else:
                yield key, value

    return dict(items())

d = get_cryptocompare_data()
data = d['Data']
data = list(map(flatten_dict, data))
df = pd.DataFrame(data)
print(df[['CoinInfo.Name', 'CoinInfo.Type', 'RAW.USD.TYPE', 'RAW.USD.SUPPLY', 'DISPLAY.USD.SUPPLY']])

Will provide the following name, type and supply columns

  CoinInfo.Name  CoinInfo.Type RAW.USD.TYPE  RAW.USD.SUPPLY    DISPLAY.USD.SUPPLY
0           BTC              1            5    1.764898e+07        Ƀ 17,648,975.0
1           XRP              1            5    9.999185e+10  XRP 99,991,850,794.0
2           ETH              1            5    1.056861e+08       Ξ 105,686,063.1
3           EOS              1            5    1.041886e+09   EOS 1,041,886,454.0
4           BCH              1            5    1.764895e+07      BCH 17,648,946.0
5           LTC              1            5    6.136893e+07        Ł 61,368,933.6
6           BNB              1            5    1.883456e+08     BNB 188,345,602.2
7          USDT              1            5    2.468906e+09     ₮ 2,468,905,774.0
8           XLM              1            5    1.932851e+10  XLM 19,328,512,453.0
9           ADA              1            5    2.592707e+10  ADA 25,927,070,538.0

Upvotes: 0

pythonjokeun
pythonjokeun

Reputation: 431

I prefer the old way (for loop) when dealing with heavily nested JSON like this because it is simple to understand.

import pandas as pd

records = []

for item in d["Data"]:
    records.append(
        {
            "Type": item["CoinInfo"]["Type"],
            "Name": item["CoinInfo"]["Name"],
            "SUPPLY": item["RAW"]["USD"]["SUPPLY"],
        }
    )


df = pd.DataFrame.from_records(records)

df.head()

I don't set the Type as the index because i'm not sure why you want Type as the index since it only has 1 value. If you need to set Type as the index, just add df.set_index("Type", inplace=True)

Upvotes: 0

Related Questions