eduardokapp
eduardokapp

Reputation: 1751

Converting a pandas dataframe to a nested dict in Python using groupby

I have the following dataframe in Python:

my_df = pd.DataFrame([["123456", "a", "80", False, "beta", None, None], ["123456", "b", "80", False, "beta", None, None],["7891011", "a", "80", False, "beta", None, None], ["7891011", "b", "80", False, "beta", None, None]], columns = ["device", "variable", "size", "manual", "method","nrow", "ncol"])

>>> my_df.head()
    device variable size  manual method  nrow  ncol
0   123456        a   80   False   beta  None  None
1   123456        b   80   False   beta  None  None
2  7891011        a   80   False   beta  None  None
3  7891011        b   80   False   beta  None  None

I want to convert it to the following nested dict structure:

{
'123456':
     {
     'a': {
          'size': 80,
          'manual': False,
          'method': 'beta',
          'nrow': None,
          'ncol': None
          },
     'b': {
          'size': 80,
          'manual': False,
          'method': 'beta',
          'nrow': None,
          'ncol': None
          }
     },
'7891011':
     {
     'a': {
          'size': 80,
          'manual': False,
          'method': 'beta',
          'nrow': None,
          'ncol': None
          },
     'b': {
          'size': 80,
          'manual': False,
          'method': 'beta',
          'nrow': None,
          'ncol': None
          }
     }
}

I can easily loop through the variables and do some filtering with pandas, but that does not seem very efficient. Is there a way to do that using df.groupby()?

Maybe:

my_df.groupby(["device", "variable"]).apply(list).to_dict()

But that messes up the key names.

Upvotes: 3

Views: 1511

Answers (2)

Corralien
Corralien

Reputation: 120391

First group by device (level 1) and keep all columns except device then set variable as index (level 2) and finally convert all columns to dict (level 3). At the end, convert the whole dataframe as a dict.

import json

d = df.groupby("device")[["variable", "size", "manual", "method", "nrow", "ncol"]] \
      .apply(lambda x: x.set_index("variable").to_dict(orient="index")) \
      .to_dict()
print(json.dumps(d, indent=4, sort_keys=True))

{
    "123456": {
        "a": {
            "manual": false,
            "method": "beta",
            "ncol": null,
            "nrow": null,
            "size": "80"
        },
        "b": {
            "manual": false,
            "method": "beta",
            "ncol": null,
            "nrow": null,
            "size": "80"
        }
    },
    "7891011": {
        "a": {
            "manual": false,
            "method": "beta",
            "ncol": null,
            "nrow": null,
            "size": "80"
        },
        "b": {
            "manual": false,
            "method": "beta",
            "ncol": null,
            "nrow": null,
            "size": "80"
        }
    }
}

Upvotes: 4

Shubham Sharma
Shubham Sharma

Reputation: 71689

Convert the required columns to records using to_dict and assign the records to the new column in the dataframe, then set the index of dataframe to device and variable and unstack to reshape again followed by to_dict with orientation index

c = ['device', 'variable']
my_df['rec'] = my_df.drop(c, 1).to_dict('r')
my_df.set_index(c)['rec'].unstack().to_dict('i')

{'123456': {'a': {'size': '80',
                  'manual': False,
                  'method': 'beta',
                  'nrow': None,
                  'ncol': None},
            'b': {'size': '80',
                  'manual': False,
                  'method': 'beta',
                  'nrow': None,
                  'ncol': None}},
 '7891011': {'a': {'size': '80',
                   'manual': False,
                   'method': 'beta',
                   'nrow': None,
                   'ncol': None},
             'b': {'size': '80',
                   'manual': False,
                   'method': 'beta',
                   'nrow': None,
                   'ncol': None}}}

Upvotes: 1

Related Questions