Pandas dataframe groups to nested dict

Question

I have a dataframe (below) that I need to turn into a nested dict by PERSON_ID, then YEAR. I've been able to do it this way;

frame = frame.T.to_dict('dict')

But unfortunately I need it structured like this;

{76129: {1951: 'IN': 3.77551684175021, 'OUT': 6.02818626979883,
         1952: 'IN': 3.67945267132245, 'OUT': 1.7685974058508,
         1953: 'IN': 3.53030183426851, 'OUT': 0.409577500579766}}

If anyone has ideas on how to approach this I would appreciate it.

	PERSON_ID	YEAR	IN	OUT
0	76129	1951	3.77551684175021	6.02818626979883
1	76224	1951	9.3791597299824	9.53608578598666
2	76250	1951	0.729347478193212	5.74296130666972
3	76322	1951	0.922030969294425	8.95933733613574
4	76129	1952	3.67945267132245	1.7685974058508
5	76224	1952	2.43404429471111	7.97540821827656
6	76250	1952	7.26162056498856	9.76505935514356
7	76322	1952	8.66970822529531	7.50026191441197
8	76129	1953	3.53030183426851	0.409577500579766
9	76224	1953	3.45390554224515	3.20774562896629
10	76250	1953	6.63976713572943	5.48027529875715
11	76322	1953	7.87048287939222	0.610433799575476

Bill · Accepted Answer

Assuming you actually want a nested dictionary like this (note the extra braces):

{76129: {1951: {'IN': 3.77551684175021, 'OUT': 6.02818626979883},
         1952: {'IN': 3.67945267132245, 'OUT': 1.7685974058508},
         1953: {'IN': 3.53030183426851, 'OUT': 0.409577500579766}},
 ... etc.
}

Here is a step-by-step approach.

First, create a dataframe with the desired (PERSON_ID, YEAR) multi-index:

frame_sorted = frame.set_index(['PERSON_ID', 'YEAR']).sort_index()
print(frame_sorted)

Output:

                      IN       OUT
PERSON_ID YEAR                    
76129     1951  3.775517  6.028186
          1952  3.679453  1.768597
          1953  3.530302  0.409578
... etc.

Then, created the nested dict using a nested dictionary comprehension:

person_ids = frame_sorted.index.levels[0]
data_dict = {person: {idx: data.to_dict() for idx, data in frame_sorted.loc[person].iterrows()}
             for person in person_ids}
print(data_dict)

Output

{76129: {1951: {'IN': 3.77551684175021, 'OUT': 6.02818626979883},
  1952: {'IN': 3.67945267132245, 'OUT': 1.7685974058508},
  1953: {'IN': 3.53030183426851, 'OUT': 0.409577500579766}},
 ...etc.

Pandas dataframe groups to nested dict

Answers (2)

Related Questions