Reputation: 394
I have a dataframe (below) that I need to turn into a nested dict by PERSON_ID
, then YEAR
. I've been able to do it this way;
frame = frame.T.to_dict('dict')
But unfortunately I need it structured like this;
{76129: {1951: 'IN': 3.77551684175021, 'OUT': 6.02818626979883,
1952: 'IN': 3.67945267132245, 'OUT': 1.7685974058508,
1953: 'IN': 3.53030183426851, 'OUT': 0.409577500579766}}
If anyone has ideas on how to approach this I would appreciate it.
PERSON_ID | YEAR | IN | OUT | |
---|---|---|---|---|
0 | 76129 | 1951 | 3.77551684175021 | 6.02818626979883 |
1 | 76224 | 1951 | 9.3791597299824 | 9.53608578598666 |
2 | 76250 | 1951 | 0.729347478193212 | 5.74296130666972 |
3 | 76322 | 1951 | 0.922030969294425 | 8.95933733613574 |
4 | 76129 | 1952 | 3.67945267132245 | 1.7685974058508 |
5 | 76224 | 1952 | 2.43404429471111 | 7.97540821827656 |
6 | 76250 | 1952 | 7.26162056498856 | 9.76505935514356 |
7 | 76322 | 1952 | 8.66970822529531 | 7.50026191441197 |
8 | 76129 | 1953 | 3.53030183426851 | 0.409577500579766 |
9 | 76224 | 1953 | 3.45390554224515 | 3.20774562896629 |
10 | 76250 | 1953 | 6.63976713572943 | 5.48027529875715 |
11 | 76322 | 1953 | 7.87048287939222 | 0.610433799575476 |
Upvotes: 1
Views: 288
Reputation: 11603
Assuming you actually want a nested dictionary like this (note the extra braces):
{76129: {1951: {'IN': 3.77551684175021, 'OUT': 6.02818626979883},
1952: {'IN': 3.67945267132245, 'OUT': 1.7685974058508},
1953: {'IN': 3.53030183426851, 'OUT': 0.409577500579766}},
... etc.
}
Here is a step-by-step approach.
First, create a dataframe with the desired (PERSON_ID, YEAR) multi-index:
frame_sorted = frame.set_index(['PERSON_ID', 'YEAR']).sort_index()
print(frame_sorted)
Output:
IN OUT
PERSON_ID YEAR
76129 1951 3.775517 6.028186
1952 3.679453 1.768597
1953 3.530302 0.409578
... etc.
Then, created the nested dict using a nested dictionary comprehension:
person_ids = frame_sorted.index.levels[0]
data_dict = {person: {idx: data.to_dict() for idx, data in frame_sorted.loc[person].iterrows()}
for person in person_ids}
print(data_dict)
Output
{76129: {1951: {'IN': 3.77551684175021, 'OUT': 6.02818626979883},
1952: {'IN': 3.67945267132245, 'OUT': 1.7685974058508},
1953: {'IN': 3.53030183426851, 'OUT': 0.409577500579766}},
...etc.
Upvotes: 2
Reputation: 323226
we ned to first set the index
, the with groupby
to_dict
in order to create the multiple level dict
d = df.set_index('YEAR').groupby('PERSON_ID').apply(lambda x : x.drop('PERSON_ID',axis = 1).to_dict('index')).to_dict()
Sample output
d[76129]
{1951: {'IN': 3.77551684175021, 'OUT': 6.02818626979883}, 1952: {'IN': 3.67945267132245, 'OUT': 1.7685974058508}, 1953: {'IN': 3.53030183426851, 'OUT': 0.409577500579766}}
Upvotes: 2