Converting a pandas dataframe to a nested dict in Python using groupby

Question

I have the following dataframe in Python:

my_df = pd.DataFrame([["123456", "a", "80", False, "beta", None, None], ["123456", "b", "80", False, "beta", None, None],["7891011", "a", "80", False, "beta", None, None], ["7891011", "b", "80", False, "beta", None, None]], columns = ["device", "variable", "size", "manual", "method","nrow", "ncol"])

>>> my_df.head()
    device variable size  manual method  nrow  ncol
0   123456        a   80   False   beta  None  None
1   123456        b   80   False   beta  None  None
2  7891011        a   80   False   beta  None  None
3  7891011        b   80   False   beta  None  None

I want to convert it to the following nested dict structure:

{
'123456':
     {
     'a': {
          'size': 80,
          'manual': False,
          'method': 'beta',
          'nrow': None,
          'ncol': None
          },
     'b': {
          'size': 80,
          'manual': False,
          'method': 'beta',
          'nrow': None,
          'ncol': None
          }
     },
'7891011':
     {
     'a': {
          'size': 80,
          'manual': False,
          'method': 'beta',
          'nrow': None,
          'ncol': None
          },
     'b': {
          'size': 80,
          'manual': False,
          'method': 'beta',
          'nrow': None,
          'ncol': None
          }
     }
}

I can easily loop through the variables and do some filtering with pandas, but that does not seem very efficient. Is there a way to do that using df.groupby()?

Maybe:

my_df.groupby(["device", "variable"]).apply(list).to_dict()

But that messes up the key names.

Corralien · Accepted Answer

First group by device (level 1) and keep all columns except device then set variable as index (level 2) and finally convert all columns to dict (level 3). At the end, convert the whole dataframe as a dict.

import json

d = df.groupby("device")[["variable", "size", "manual", "method", "nrow", "ncol"]] \
      .apply(lambda x: x.set_index("variable").to_dict(orient="index")) \
      .to_dict()
print(json.dumps(d, indent=4, sort_keys=True))

{
    "123456": {
        "a": {
            "manual": false,
            "method": "beta",
            "ncol": null,
            "nrow": null,
            "size": "80"
        },
        "b": {
            "manual": false,
            "method": "beta",
            "ncol": null,
            "nrow": null,
            "size": "80"
        }
    },
    "7891011": {
        "a": {
            "manual": false,
            "method": "beta",
            "ncol": null,
            "nrow": null,
            "size": "80"
        },
        "b": {
            "manual": false,
            "method": "beta",
            "ncol": null,
            "nrow": null,
            "size": "80"
        }
    }
}

Converting a pandas dataframe to a nested dict in Python using groupby

Answers (2)

Related Questions