muazfaiz
muazfaiz

Reputation: 5031

Unexpected results from Pandas.DataFrame.resample

I have data set structured like below example for the month of January 2017.

Date    ProductID   ProductType Qty
1.1.2017    1000    101 7
1.1.2017    1001    111 2
1.1.2017    1000    101 1
1.1.2017    1004    107 12
2.1.2017    1000    101 6
2.1.2017    1001    111 5
2.1.2017    1001    111 4
…..         
31.1.2017   1000    101 7
31.1.2017   1001    111 5
31.1.2017   1001    111 7

I want to calculate weekly sales for each product ID with product type 101 and 111 such that my result looks like the following

ProductID|  WeeklyDates|    Sales
1000     | 1.1.2017    |     14
         | 1.8.2017    |     NaN
         | 1.15.2017   |     NaN
         | 1.22.2017   |     NaN
         | 1.29.2017   |      7
-----------------------------------
1001     | 1.1.2017    |     11
         | 1.8.2017    |     NaN
         | 1.15.2017   |     NaN
         | 1.22.2017   |     NaN
         | 1.29.2017   |     12

Here NaN means that I have no data for these dates in the example. Now to get these results I am using the following code

import pandas as pd

df = pd.read_csv('data.csv', encoding = 'latin-1', sep=',')
df['Date'] = pd.to_datetime(df['Date'])
transaction_types = [101, 111]
s_df = df[df['ProductType'].isin(transaction_types)]
res_df = s_df.filter(['Date','ProductID','Qty']) # filter it because I do not want other product type column now
res_df = res_df.set_index('Date').groupby('ProductID').resample('W').sum()
res_df.to_csv('result.csv', sep=';', encoding='latin-1')

It returns me some wierd results. I am getting some dates which I don't even have in the data. I am showing results for only one ID

ProductID|  Date        |ProductID| Qty
1000     |   01/01/2017 |  4000   |  41
1000     |   08/01/2017 |         |
1000     |   15/01/2017 |  33000  |  54
1000     |   22/01/2017 |  87000  |  313
1000     |   29/01/2017 |  79000  |  94
1000     |   05/02/2017 |  36000  |  413
1000     |   12/02/2017 |         | 
1000     |   19/02/2017 |         |
1000     |   26/02/2017 |         |
1000     |   05/03/2017 |  8000   |  78

The results are original and will not match the above example. But the productID are coming 2 times and I think it also summing the productIDs too. Also the sum is not correct. The dates are also going till march and in my data set I have dates of only January. Can someone guide me where are possible problems in my code ? Thanks

Upvotes: 0

Views: 908

Answers (2)

zeros and ones
zeros and ones

Reputation: 61

I got stuck on this same problem and came across this answer. After looking through the pandas documentation I learnt a more flexible way to solve this is to allow pandas to infer the 'datetime' format as follow; df['Date'] = pd.to_datetime(df['Date'], infer_datetime_format=True)

Upvotes: 1

muazfaiz
muazfaiz

Reputation: 5031

I was not giving any date format.For example

df['Date'] = pd.to_datetime(df['Date']) # Not correct
df['Date'] = pd.to_datetime(df['Date'], format='%d.%m.%Y') # Correct way

So because of this it was considering months as days and vice versa which was the reason I was getting wrong results.

Upvotes: 1

Related Questions