Outcast
Outcast

Reputation: 5117

Column missing after Pandas GroupBy (not the GroupBy column)

I am using the following source code:

import numpy as np
import pandas as pd


# Load data
data = pd.read_csv('C:/Users/user/Desktop/Daily_to_weekly.csv', keep_default_na=True)

print(data.shape[1])
# 18

# Create weekly data
# Agreggate by calculating the sum per store for every week
data_weekly = data.groupby(['STORE_ID', 'WEEK_NUMBER'], as_index=False).agg('sum')

print(data_weekly.shape[1])
# 17 

As you may see for some reason a column is missing after the aggregation and this column is neither of the GroupBy columns ('STORE_ID', 'WEEK_NUMBER').

Why is this happening and how can I fix it?

Upvotes: 3

Views: 3862

Answers (1)

Phoenix Jauregui
Phoenix Jauregui

Reputation: 31

I've run in to this problem numerous times before. The problem is panda's is dropping one of your columns because it has identified it as a "nuisance" column. This means that the aggregation you are attempting to do cannot be applied to it. If you wish to preserve this column I would recommend including it in the groupby.

https://pandas.pydata.org/pandas-docs/stable/user_guide/groupby.html#automatic-exclusion-of-nuisance-columns

Upvotes: 3

Related Questions