Reputation: 68
I have a data frame like this
col1 col2 col3 col4 action_id
0 1 2 2 0 a, apple
1 1 2 3 5 b, apple
2 0.2 0.3 8 1 c, apple
3 0.2 0.02 1 2 a, apple
4 11 11 22 11 b, apple
I want to convert this data frame into dict with action_id as my key and others as my values.
I want my output in this manner:
{(1, 'a', 'apple'): array([[1, 2, 2, 0]]),
(1, 'b', 'apple'): array([[1, 2, 3, 5]]),
(1, 'c', 'apple'): array([[0.2, 0.3, 8, 1]]),
(2, 'a', 'apple'): array([[0.2, 0.02, 1, 2]]),
(2, 'b', 'apple'): array([[11, 11, 22, 11]])}
I have tried this method
data2d = var.set_index('action_id').T.to_dict('list')
considering var
as my dataframe.
But this method is overwriting the values in dict with the duplicate keys and only returns me the last values from the duplicate key. Is there any way I can get duplicate keys also with different values?
{('c', 'apple'): array([[0.2, 0.3, 8, 1]]),
('a', 'apple'): array([[0.2, 0.02, 1, 2]]),
('b', 'apple'): array([[11, 11, 22, 11]])}
Edit
I did a little change and added 1 more element to my action_id
and now my frame looks like this:
col1 col2 col3 col4 action_id
0 1 2 2 0 1, a, apple
1 1 2 3 5 1, b, apple
2 0.2 0.3 8 1 1, c, apple
3 0.2 0.02 1 2 2, a, apple
4 11 11 22 11 2, b, apple
but still, I am getting the same issue as only my last values are coming
{(1, 'c', 'apple'): array([[0.2, 0.3, 8, 1]]),
(2, 'a', 'apple'): array([[0.2, 0.02, 1, 2]]),
(2, 'b', 'apple'): array([[11, 11, 22, 11]])}
Upvotes: 0
Views: 93
Reputation: 765
k=df1.action_id.str.split(",").map(tuple)
v=df1.loc[:,:'col4'].apply(lambda ss:ss.to_numpy(),axis=1)
dict(zip(k,v))
out:
{('a', ' apple'): array([0.2 , 0.02, 1. , 2. ]),
('b', ' apple'): array([11., 11., 22., 11.]),
('c', ' apple'): array([0.2, 0.3, 8. , 1. ])}
Upvotes: 0
Reputation: 261850
It is impossible to have duplicated keys in a python dictionary.
If you want, you can aggregate at the list/array level:
var.set_index('action_id').groupby(level=0).agg(list).T.to_dict('list')
Output:
{('a', 'apple'): [[1.0, 0.2], [2.0, 0.02], [2, 1], [0, 2]],
('b', 'apple'): [[1.0, 11.0], [2.0, 11.0], [3, 22], [5, 11]],
('c', 'apple'): [[0.2], [0.3], [8], [1]]}
Or:
var.set_index('action_id').groupby(level=0).apply(lambda g: g.to_numpy()).to_dict()
Output:
{('a', 'apple'): array([[1. , 2. , 2. , 0. ],
[0.2 , 0.02, 1. , 2. ]]),
('b', 'apple'): array([[ 1., 2., 3., 5.],
[11., 11., 22., 11.]]),
('c', 'apple'): array([[0.2, 0.3, 8. , 1. ]])}
Upvotes: 1