Nick
Nick

Reputation: 394

Pandas dataframe groups to nested dict

I have a dataframe (below) that I need to turn into a nested dict by PERSON_ID, then YEAR. I've been able to do it this way;

frame = frame.T.to_dict('dict')

But unfortunately I need it structured like this;

{76129: {1951: 'IN': 3.77551684175021, 'OUT': 6.02818626979883,
         1952: 'IN': 3.67945267132245, 'OUT': 1.7685974058508,
         1953: 'IN': 3.53030183426851, 'OUT': 0.409577500579766}}

If anyone has ideas on how to approach this I would appreciate it.

PERSON_ID YEAR IN OUT
0 76129 1951 3.77551684175021 6.02818626979883
1 76224 1951 9.3791597299824 9.53608578598666
2 76250 1951 0.729347478193212 5.74296130666972
3 76322 1951 0.922030969294425 8.95933733613574
4 76129 1952 3.67945267132245 1.7685974058508
5 76224 1952 2.43404429471111 7.97540821827656
6 76250 1952 7.26162056498856 9.76505935514356
7 76322 1952 8.66970822529531 7.50026191441197
8 76129 1953 3.53030183426851 0.409577500579766
9 76224 1953 3.45390554224515 3.20774562896629
10 76250 1953 6.63976713572943 5.48027529875715
11 76322 1953 7.87048287939222 0.610433799575476

Upvotes: 1

Views: 288

Answers (2)

Bill
Bill

Reputation: 11603

Assuming you actually want a nested dictionary like this (note the extra braces):

{76129: {1951: {'IN': 3.77551684175021, 'OUT': 6.02818626979883},
         1952: {'IN': 3.67945267132245, 'OUT': 1.7685974058508},
         1953: {'IN': 3.53030183426851, 'OUT': 0.409577500579766}},
 ... etc.
}

Here is a step-by-step approach.

First, create a dataframe with the desired (PERSON_ID, YEAR) multi-index:

frame_sorted = frame.set_index(['PERSON_ID', 'YEAR']).sort_index()
print(frame_sorted)

Output:

                      IN       OUT
PERSON_ID YEAR                    
76129     1951  3.775517  6.028186
          1952  3.679453  1.768597
          1953  3.530302  0.409578
... etc.

Then, created the nested dict using a nested dictionary comprehension:

person_ids = frame_sorted.index.levels[0]
data_dict = {person: {idx: data.to_dict() for idx, data in frame_sorted.loc[person].iterrows()}
             for person in person_ids}
print(data_dict)

Output

{76129: {1951: {'IN': 3.77551684175021, 'OUT': 6.02818626979883},
  1952: {'IN': 3.67945267132245, 'OUT': 1.7685974058508},
  1953: {'IN': 3.53030183426851, 'OUT': 0.409577500579766}},
 ...etc.

Upvotes: 2

BENY
BENY

Reputation: 323226

we ned to first set the index , the with groupby to_dict in order to create the multiple level dict

d = df.set_index('YEAR').groupby('PERSON_ID').apply(lambda x : x.drop('PERSON_ID',axis = 1).to_dict('index')).to_dict()

Sample output

d[76129]
{1951: {'IN': 3.77551684175021, 'OUT': 6.02818626979883}, 1952: {'IN': 3.67945267132245, 'OUT': 1.7685974058508}, 1953: {'IN': 3.53030183426851, 'OUT': 0.409577500579766}}

Upvotes: 2

Related Questions