Anil Tiwari
Anil Tiwari

Reputation: 81

How to convert a nested dict into dataframe

Let's say I have an API response as:

{
    "fact": {
        "UP": [{
            "SCODE": "CNB",
            "SNAME": "Kanpur Central"
        }, {
            "SCODE": "JHS",
            "SNAME": "Jhansi Junction"
        }],
        "MP": [{
            "SCODE": "BPL",
            "SNAME": "Bhopal Junction"
        }, {
            "SCODE": "JBP",
            "SNAME": "Jabalpur Junction"
        }]
    }
}

I have to convert this into a dataframe which looks like below (expected output):

fact    SCODE   SNAME
UP      CNB     Kanpur Central
UP      JHS     Jhansi Junction
MP      BPL     Bhopal Junction
MP      JBP     Jabalpur Junction

My effort: I tried using json_normalize() but didn't reaches to expected output:

pd.json_normalize(response).apply(pd.Series.explode)

Upvotes: 1

Views: 117

Answers (4)

sammywemmy
sammywemmy

Reputation: 28644

Not as efficient as working directly within dictionaries (which the chosen answer does quite well):

    data = {
    "fact": {
        "UP": [{
            "SCODE": "CNB",
            "SNAME": "Kanpur Central"
        }, {
            "SCODE": "JHS",
            "SNAME": "Jhansi Junction"
        }],
        "MP": [{
            "SCODE": "BPL",
            "SNAME": "Bhopal Junction"
        }, {
            "SCODE": "JBP",
            "SNAME": "Jabalpur Junction"
        }]
    }
}

keys = data['fact']

   (pd.concat([jn(data['fact'][key]) for key in keys], 
              keys = keys)
      .droplevel(-1)
      .rename_axis(index='fact')
      .reset_index()
     )



fact SCODE              SNAME
0   UP   CNB     Kanpur Central
1   UP   JHS    Jhansi Junction
2   MP   BPL    Bhopal Junction
3   MP   JBP  Jabalpur Junction

Upvotes: 1

bruno-uy
bruno-uy

Reputation: 1875

  • Using the response from the OP.
  • You must create another structure since json_normalize works with list of dictionaries and the fact must be included there:
new_response = [{"fact": rfact, **r} for rfact in response["fact"] for r in response["fact"][rfact]]

Finally, you just need to apply the function:

final_result = pd.json_normalize(new_response)

  fact SCODE              SNAME
0   UP   CNB     Kanpur Central
1   UP   JHS    Jhansi Junction
2   MP   BPL    Bhopal Junction
3   MP   JBP  Jabalpur Junction

Upvotes: 1

ThePyGuy
ThePyGuy

Reputation: 18416

df = pd.DataFrame.from_dict(d)
df.index.name = df.columns[0]

df = (df.append(df['fact'].explode().apply(pd.Series)).drop(columns=['fact'])
        .dropna()
        .reset_index()
    )

OUTPUT:

  fact SCODE              SNAME
0   MP   BPL    Bhopal Junction
1   MP   JBP  Jabalpur Junction
2   UP   CNB     Kanpur Central
3   UP   JHS    Jhansi Junction

Upvotes: 0

Henry Ecker
Henry Ecker

Reputation: 35636

One option is to reshape with python:

df = pd.DataFrame([{'fact': k, **item}
                   for k, lst in response['fact'].items()
                   for item in lst])
  fact SCODE              SNAME
0   UP   CNB     Kanpur Central
1   UP   JHS    Jhansi Junction
2   MP   BPL    Bhopal Junction
3   MP   JBP  Jabalpur Junction

A pandas option via explode + apply pd.Series:

df = (
    pd.DataFrame(response)['fact']
        .explode()
        .apply(pd.Series)
        .rename_axis('fact')
        .reset_index()
)
  fact SCODE              SNAME
0   MP   BPL    Bhopal Junction
1   MP   JBP  Jabalpur Junction
2   UP   CNB     Kanpur Central
3   UP   JHS    Jhansi Junction

Upvotes: 5

Related Questions