Reputation: 111
pd.DataFrame({'col1': [1,1,1,1,1,1, 2,2,2,2,2,2], 'col2': ['in', 'out','in', 'out','in', 'out','in', 'out','in', 'out','in', 'out'], 'col3':['A','B','C','D','E','F','G','H','I','J','K','L']})
I'm looking for an efficient way to transform the above dataframe to a table like this:
| Col1 | Col2 | Col3 |
| -----| ---- | ---- ---- |
| 1 | in |'A','C','E'|
| 1 | out |'B','D','F'|
| 2 | in |'G','I','K'|
| 2 | out |'H','J','L'|
Upvotes: 0
Views: 68
Reputation: 825
You can use apply
like this
df.groupby(['col1','col2'])['col3'].apply(list).reset_index()
Output
col1 col2 col3
0 1 in [A,C,E]
1 1 out [B,D,F]
2 2 in [G,I,K]
3 2 out [H,J,L]
Upvotes: 1
Reputation: 14949
You can use groupby
:
result = df.groupby(['col1', 'col2'], as_index=False).agg({'col3': ','.join})
OUTPUT:
col1 col2 col3
0 1 in A,C,E
1 1 out B,D,F
2 2 in G,I,K
3 2 out H,J,L
Upvotes: 4