Reputation: 96
I have a pandas dataset which i want to groupby and agg with sum function.
When I use just df['col1'].sum()
, I get different result then after agg:
data_grouped = data[['col2','col3','col1']].groupby(['col2','col3'])['col1'].sum()
I have already tried using dropna=False
, but I get the same result. The sum after agg is lower then just simple sum in dataset.
Where can be the mistake?
Upvotes: 0
Views: 242
Reputation: 92
# Check for missing values in 'col1'
missing_values = data['col1'].isnull().sum()
print("Number of missing values in 'col1':", missing_values)
data_cleaned = data.dropna(subset=['col1'])
data_grouped = data_cleaned.groupby(['col2', 'col3'])['col1'].sum()
data_filled = data.fillna({'col1': 0})
data_grouped = data_filled.groupby(['col2', 'col3'])['col1'].sum()
data_grouped = data.groupby(['col2', 'col3'])['col1'].agg(np.nansum)
Upvotes: 1