user324
user324

Reputation: 351

Pandas dataframe groupby

I am a beginner in Pandas so please bear with me. I know this is a very basic question/

I am working with pandas on the following dataframe :

x      y             w  

1      2             5                 
1      2             7         
3      4             3        
5      4             8    
3      4             5    
5      9             9   

And I want the following output :

x   y   w   

1   2   5,7    
3   4   2,5    
5   4   8    
5   9   9

Can Anyone tell me how to do it using pandas groupby.

Upvotes: 1

Views: 1568

Answers (2)

EdChum
EdChum

Reputation: 393883

You can groupby on columns 'x' and 'y' and apply a lambda on the 'w' column, if required you need to cast the dtype using astype:

In [220]:
df.groupby(['x','y'])['w'].apply(lambda x: ','.join(x.astype(str)))

Out[220]:
x  y
1  2    5,7
3  4    3,5
5  4      8
   9      9
Name: w, dtype: object

In [221]:
df.groupby(['x','y'])['w'].apply(lambda x: ','.join(x.astype(str))).reset_index()

Out[221]:
   x  y    w
0  1  2  5,7
1  3  4  3,5
2  5  4    8
3  5  9    9

EDIT

on your modified sample:

In [237]:
df.groupby(['x','y'])['w'].apply(lambda x: ','.join(x.unique().astype(str))).reset_index()

Out[237]:
   x  y    w
0  1  2  5,7
1  3  4  3,5
2  5  4    8
3  5  9    9

Upvotes: 1

jezrael
jezrael

Reputation: 862406

You can use groupby with apply join:

#if type of column w is not string, convert it
print type(df.at[0,'w'])
<type 'numpy.int64'>

df['w'] = df['w'].astype(str)

print df.groupby(['x','y'])['w'].apply(','.join).reset_index()
   x  y    w
0  1  2  5,7
1  3  4  3,5
2  5  4    8
3  5  9    9

If you have duplicates, use drop_duplicates:

print df
   x  y  w
0  1  2  5
1  1  2  5
2  1  2  5
3  1  2  7
4  3  4  3
5  5  4  8
6  3  4  5
7  5  9  9

df['w'] = df['w'].astype(str)
print df.groupby(['x','y'])['w'].apply(lambda x: ','.join(x.drop_duplicates()))
        .reset_index()

   x  y    w
0  1  2  5,7
1  3  4  3,5
2  5  4    8
3  5  9    9

Or modified EdChum solution:

print df.groupby(['x','y'])['w'].apply(lambda x: ','.join(x.astype(str).drop_duplicates()))
        .reset_index()

   x  y    w
0  1  2  5,7
1  3  4  3,5
2  5  4    8
3  5  9    9

Upvotes: 1

Related Questions