Reputation: 351
I am a beginner in Pandas so please bear with me. I know this is a very basic question/
I am working with pandas on the following dataframe :
x y w
1 2 5
1 2 7
3 4 3
5 4 8
3 4 5
5 9 9
And I want the following output :
x y w
1 2 5,7
3 4 2,5
5 4 8
5 9 9
Can Anyone tell me how to do it using pandas groupby.
Upvotes: 1
Views: 1568
Reputation: 393883
You can groupby
on columns 'x' and 'y' and apply a lambda on the 'w' column, if required you need to cast the dtype
using astype
:
In [220]:
df.groupby(['x','y'])['w'].apply(lambda x: ','.join(x.astype(str)))
Out[220]:
x y
1 2 5,7
3 4 3,5
5 4 8
9 9
Name: w, dtype: object
In [221]:
df.groupby(['x','y'])['w'].apply(lambda x: ','.join(x.astype(str))).reset_index()
Out[221]:
x y w
0 1 2 5,7
1 3 4 3,5
2 5 4 8
3 5 9 9
EDIT
on your modified sample:
In [237]:
df.groupby(['x','y'])['w'].apply(lambda x: ','.join(x.unique().astype(str))).reset_index()
Out[237]:
x y w
0 1 2 5,7
1 3 4 3,5
2 5 4 8
3 5 9 9
Upvotes: 1
Reputation: 862406
You can use groupby
with apply
join
:
#if type of column w is not string, convert it
print type(df.at[0,'w'])
<type 'numpy.int64'>
df['w'] = df['w'].astype(str)
print df.groupby(['x','y'])['w'].apply(','.join).reset_index()
x y w
0 1 2 5,7
1 3 4 3,5
2 5 4 8
3 5 9 9
If you have duplicates, use drop_duplicates
:
print df
x y w
0 1 2 5
1 1 2 5
2 1 2 5
3 1 2 7
4 3 4 3
5 5 4 8
6 3 4 5
7 5 9 9
df['w'] = df['w'].astype(str)
print df.groupby(['x','y'])['w'].apply(lambda x: ','.join(x.drop_duplicates()))
.reset_index()
x y w
0 1 2 5,7
1 3 4 3,5
2 5 4 8
3 5 9 9
Or modified EdChum solution:
print df.groupby(['x','y'])['w'].apply(lambda x: ','.join(x.astype(str).drop_duplicates()))
.reset_index()
x y w
0 1 2 5,7
1 3 4 3,5
2 5 4 8
3 5 9 9
Upvotes: 1