Reputation: 1147
Certain columns of my data frame contain tuples. Whenever I do aggregation via group by that columns do not appear in the resulting data frame unless explicitly specified.
Example,
df = pd.DataFrame()
df['A'] = [1, 2, 1, 2]
df['B'] = [1, 2, 3, 4]
df['C'] = map(lambda s: (s,), df['B'])
print df
A B C
0 1 1 (1,)
1 2 2 (2,)
2 1 3 (3,)
3 2 4 (4,)
If I do the following way then the column C
does not appear in aggregation
print df.groupby('A').sum()
B
A
1 4
2 6
But if I specify it explicitly it appears as expected
print df[['A', 'C']].groupby('A').sum()
C
A
1 (1, 3)
2 (2, 4)
Could you please tell me why the C
column didn't appear in the first case?
I would like it to go by default.
Upvotes: 2
Views: 2478
Reputation: 862681
Because you aggregate by column B
, not column C
:
import pandas as pd
import numpy as np
df = pd.DataFrame()
df['A'] = [1, 2, 1, 2]
df['B'] = [1, 2, 3, 4]
df['C'] = map(lambda s: (s,), df['B'])
print df
df.at[0,'B'] = 10
print df
A B C
0 1 10 (1,)
1 2 2 (2,)
2 1 3 (3,)
3 2 4 (4,)
print df.groupby('A').sum()
B
A
1 13
2 6
print df.groupby('A')['B'].sum()
1 13
2 6
Name: B, dtype: int64
Upvotes: 1