madst
madst

Reputation: 93

Pandas: Looking to avoid a for loop when creating a nested dictionary

Here is my data:

df:

id sub_id
A  1
A  2
B  3
B  4

and I have the following array:

[[1,2],
[2,5],
[1,4],
[7,8]]

Here is my code:

from collections import defaultdict

sub_id_array_dict = defaultdict(dict)
for i, s, a in zip(df['id'].to_list(), df['sub_id'].to_list(), arrays):
    sub_id_array_dict[i][s] = a

Now, my actual dataframe includes a total of 100M rows (unique sub_id) with 500K unique ids. Ideally, I'd like to avoid a for loop.

Any help would be much appreciated.

Upvotes: 0

Views: 43

Answers (1)

Vishnudev Krishnadas
Vishnudev Krishnadas

Reputation: 10960

Assuming the arrays variable has same number of rows as in the Dataframe,

df['value'] = arrays

Convert into dictionary by grouping

df.groupby('id').apply(lambda x: dict(zip(x.sub_id, x.value))).to_dict()

Output

{'A': {1: [1, 2], 2: [2, 5]}, 'B': {3: [1, 4], 4: [7, 8]}}

Upvotes: 1

Related Questions