Pandas tuples groupby aggregation

Question

Certain columns of my data frame contain tuples. Whenever I do aggregation via group by that columns do not appear in the resulting data frame unless explicitly specified.

Example,

df = pd.DataFrame()
df['A'] = [1, 2, 1, 2]
df['B'] = [1, 2, 3, 4]
df['C'] = map(lambda s: (s,), df['B'])
print df
   A  B     C
0  1  1  (1,)
1  2  2  (2,)
2  1  3  (3,)
3  2  4  (4,)

If I do the following way then the column C does not appear in aggregation

print df.groupby('A').sum()
   B
A   
1  4
2  6

But if I specify it explicitly it appears as expected

print df[['A', 'C']].groupby('A').sum()
        C
A        
1  (1, 3)
2  (2, 4)

Could you please tell me why the C column didn't appear in the first case?

I would like it to go by default.

jezrael · Accepted Answer

Because you aggregate by column B, not column C:

import pandas as pd
import numpy as np

df = pd.DataFrame()
df['A'] = [1, 2, 1, 2]
df['B'] = [1, 2, 3, 4]
df['C'] = map(lambda s: (s,), df['B'])
print df

df.at[0,'B'] = 10
print df
   A   B     C
0  1  10  (1,)
1  2   2  (2,)
2  1   3  (3,)
3  2   4  (4,)

print df.groupby('A').sum()
    B
A    
1  13
2   6

print df.groupby('A')['B'].sum()
1    13
2     6
Name: B, dtype: int64

Pandas tuples groupby aggregation

Answers (1)

Related Questions