William Wade
William Wade

Reputation: 405

Resample() returning incorrect figures for non-existent dates

I have a data frame in this format:

Date Posted     Receipt Amount  Centre      Brand
07-10-2019      6000.0          Centre 1    Brand 1
07-05-2019      6346.66         Centre 2    Brand 1
03-01-2019      6173.34         Centre 1    Brand 2
11-06-2019      6000.0          Centre 1    Brand 2
13-09-2019      6346.66         Centre 3    Brand 1
07-11-2019      6098.34         Centre 4    Brand 1

I am re-sampling the data for time series forecasting purposes:

df=pd.read_csv("File Directory")
df["Receipt Amount"] = df["Receipt Amount"].astype(float)
brands=list((pd.Series(df["Brand"].unique())).dropna())


df['Date Posted'] = pd.DatetimeIndex(df['Date Posted'])
df.index = df['Date Posted']
df=df.drop(["Date Posted"],axis=1)


for brand in brands:
    brand_filter=df['Brand']==brand
    brand_df=df[brand_filter]

    brand_df=brand_df[["Receipt Amount"]]

    brand_df=brand_df.resample('D').sum()   
    brand_df.reset_index(level=0, inplace=True)
    brand_df = brand_df.rename({'Date Posted': 'ds'}, axis=1)
    brand_df = brand_df.rename({'Receipt Amount': 'y'}, axis=1)

However this returns some of the sum values as 0 which I know to be false. Also it returns values for days in December which once again I know to be false. (All the data is no more recent than November)

This is the code in its entirety so I am unsure where I have made a mistake.

Upvotes: 0

Views: 367

Answers (1)

William Wade
William Wade

Reputation: 405

I have now resolved this issue, so here is the solution for future desperate Googlers.

The dates weren't being read in correctly by:

df['Date Posted'] = pd.DatetimeIndex(df['Date Posted'])

Some dates it was reading as dd/mm/yyyy while others were being read for mm/dd/yyyy.

To solve this add dayfirst=True to the function

df['Date Posted'] = pd.to_datetime(df['Date Posted'],dayfirst=True)

Upvotes: 3

Related Questions