Reputation: 1751
I have the following dataframe in Python:
my_df = pd.DataFrame([["123456", "a", "80", False, "beta", None, None], ["123456", "b", "80", False, "beta", None, None],["7891011", "a", "80", False, "beta", None, None], ["7891011", "b", "80", False, "beta", None, None]], columns = ["device", "variable", "size", "manual", "method","nrow", "ncol"])
>>> my_df.head()
device variable size manual method nrow ncol
0 123456 a 80 False beta None None
1 123456 b 80 False beta None None
2 7891011 a 80 False beta None None
3 7891011 b 80 False beta None None
I want to convert it to the following nested dict structure:
{
'123456':
{
'a': {
'size': 80,
'manual': False,
'method': 'beta',
'nrow': None,
'ncol': None
},
'b': {
'size': 80,
'manual': False,
'method': 'beta',
'nrow': None,
'ncol': None
}
},
'7891011':
{
'a': {
'size': 80,
'manual': False,
'method': 'beta',
'nrow': None,
'ncol': None
},
'b': {
'size': 80,
'manual': False,
'method': 'beta',
'nrow': None,
'ncol': None
}
}
}
I can easily loop through the variables and do some filtering with pandas, but that does not seem very efficient. Is there a way to do that using df.groupby()
?
Maybe:
my_df.groupby(["device", "variable"]).apply(list).to_dict()
But that messes up the key names.
Upvotes: 3
Views: 1511
Reputation: 120391
First group by device
(level 1) and keep all columns except device
then set variable
as index (level 2) and finally convert all columns to dict (level 3). At the end, convert the whole dataframe as a dict.
import json
d = df.groupby("device")[["variable", "size", "manual", "method", "nrow", "ncol"]] \
.apply(lambda x: x.set_index("variable").to_dict(orient="index")) \
.to_dict()
print(json.dumps(d, indent=4, sort_keys=True))
{
"123456": {
"a": {
"manual": false,
"method": "beta",
"ncol": null,
"nrow": null,
"size": "80"
},
"b": {
"manual": false,
"method": "beta",
"ncol": null,
"nrow": null,
"size": "80"
}
},
"7891011": {
"a": {
"manual": false,
"method": "beta",
"ncol": null,
"nrow": null,
"size": "80"
},
"b": {
"manual": false,
"method": "beta",
"ncol": null,
"nrow": null,
"size": "80"
}
}
}
Upvotes: 4
Reputation: 71689
Convert the required columns to records using to_dict
and assign the records to the new column in the dataframe, then set the index of dataframe to device
and variable
and unstack
to reshape again followed by to_dict
with orientation index
c = ['device', 'variable']
my_df['rec'] = my_df.drop(c, 1).to_dict('r')
my_df.set_index(c)['rec'].unstack().to_dict('i')
{'123456': {'a': {'size': '80',
'manual': False,
'method': 'beta',
'nrow': None,
'ncol': None},
'b': {'size': '80',
'manual': False,
'method': 'beta',
'nrow': None,
'ncol': None}},
'7891011': {'a': {'size': '80',
'manual': False,
'method': 'beta',
'nrow': None,
'ncol': None},
'b': {'size': '80',
'manual': False,
'method': 'beta',
'nrow': None,
'ncol': None}}}
Upvotes: 1