Reputation: 33
So I have a dataframe that I want to combine some rows via a group by.
Sample DF,
col_a col_b col_c col_e col_f
0 1 0 1 -1.0 2
1 1 1 3 0.0 3
2 1 2 4 NaN 3
3 2 0 3 4.0 6
4 3 0 3 4.0 2
And what I want the output to look like is this...
df.groupby('col_a')
col_a, col_c ...col_f
1 {0: 1, 1: 3, 2:4} {0:2,1:3,2:3}
2 .... ....
3 .... ....
Basically, group by col_a, then aggregate all the values we got for col_c through col_f, set the values into a dictionary where col_b is the dictionary key.
Not sure if there's a way to use groupby and maybe some kind of agg function or if I'm just resigned to writing a python function that takes the dataframe and just iterates over every row and using .apply. Ideas?
Edit:
Original:
col_a col_b col_c col_e col_f
0 1 A 1 -1.0 2
1 1 B 3 0.0 3
2 1 C 4 NaN 3
3 2 A 3 4.0 6
4 3 A 3 4.0 2
Desired:
col_a, col_c ...col_f
1 {A: 1, B: 3, C:4} {A:2,B:3,C:3}
2 .... ....
3 {A:3} {A:2}
Upvotes: 1
Views: 743
Reputation: 375415
I don't think you want to do this, rarely is there a need for a DataFrame of dicts. You can do all the same operations (and more) using a DataFrame with these as index/columns in a MultiIndex:
In [11]: res = df.set_index(["col_a", "col_b"])
In [12]: res
Out[11]:
col_c col_e col_f
col_a col_b
1 0 1 -1.0 2
1 3 0.0 3
2 4 NaN 3
2 0 3 4.0 6
3 0 3 4.0 2
Now you can access into the DataFrame by col_a, col_b and any other column (as if that were a dict).
In [13]: res.loc[(1, 2), "col_c"]
Out[13]: 4.0
In [14]: res.loc[1, "col_c"]
Out[14]:
col_b
0 1
1 3
2 4
Name: col_c, dtype: int64
etc. This is going to be more efficient that using a dict inside a DataFrame...
Upvotes: 1