Converting a dataframe into dict values with duplicate keys

Question

I have a data frame like this

   col1   col2    col3  col4   action_id
0   1      2        2     0       a, apple
1   1      2        3     5       b, apple
2   0.2   0.3       8     1       c, apple
3   0.2   0.02      1     2       a, apple
4   11     11       22    11      b, apple

I want to convert this data frame into dict with action_id as my key and others as my values.

I want my output in this manner:

{(1, 'a', 'apple'): array([[1, 2, 2, 0]]),
(1, 'b', 'apple'): array([[1, 2, 3, 5]]),
(1, 'c', 'apple'): array([[0.2, 0.3, 8, 1]]),
(2, 'a', 'apple'): array([[0.2, 0.02, 1, 2]]),
(2, 'b', 'apple'): array([[11, 11, 22, 11]])}

I have tried this method

data2d = var.set_index('action_id').T.to_dict('list') considering var as my dataframe.

But this method is overwriting the values in dict with the duplicate keys and only returns me the last values from the duplicate key. Is there any way I can get duplicate keys also with different values?

{('c', 'apple'): array([[0.2, 0.3, 8, 1]]),
('a', 'apple'): array([[0.2, 0.02, 1, 2]]),
('b', 'apple'): array([[11, 11, 22, 11]])}

Edit

I did a little change and added 1 more element to my action_id and now my frame looks like this:

   col1   col2    col3  col4   action_id
0   1      2        2     0       1, a, apple
1   1      2        3     5       1, b, apple
2   0.2   0.3       8     1       1, c, apple
3   0.2   0.02      1     2       2, a, apple
4   11     11       22    11      2, b, apple

but still, I am getting the same issue as only my last values are coming

{(1, 'c', 'apple'): array([[0.2, 0.3, 8, 1]]),
(2, 'a', 'apple'): array([[0.2, 0.02, 1, 2]]),
(2, 'b', 'apple'): array([[11, 11, 22, 11]])}

mozway · Accepted Answer

It is impossible to have duplicated keys in a python dictionary.

If you want, you can aggregate at the list/array level:

var.set_index('action_id').groupby(level=0).agg(list).T.to_dict('list')

Output:

{('a', 'apple'): [[1.0, 0.2], [2.0, 0.02], [2, 1], [0, 2]],
 ('b', 'apple'): [[1.0, 11.0], [2.0, 11.0], [3, 22], [5, 11]],
 ('c', 'apple'): [[0.2], [0.3], [8], [1]]}

Or:

var.set_index('action_id').groupby(level=0).apply(lambda g: g.to_numpy()).to_dict()

Output:

{('a', 'apple'): array([[1.  , 2.  , 2.  , 0.  ],
                        [0.2 , 0.02, 1.  , 2.  ]]),
 ('b', 'apple'): array([[ 1.,  2.,  3.,  5.],
                        [11., 11., 22., 11.]]),
 ('c', 'apple'): array([[0.2, 0.3, 8. , 1. ]])}

Converting a dataframe into dict values with duplicate keys

Answers (2)

Related Questions