Reputation: 5031
I have data set structured like below example for the month of January 2017.
Date ProductID ProductType Qty
1.1.2017 1000 101 7
1.1.2017 1001 111 2
1.1.2017 1000 101 1
1.1.2017 1004 107 12
2.1.2017 1000 101 6
2.1.2017 1001 111 5
2.1.2017 1001 111 4
…..
31.1.2017 1000 101 7
31.1.2017 1001 111 5
31.1.2017 1001 111 7
I want to calculate weekly sales for each product ID with product type 101 and 111 such that my result looks like the following
ProductID| WeeklyDates| Sales
1000 | 1.1.2017 | 14
| 1.8.2017 | NaN
| 1.15.2017 | NaN
| 1.22.2017 | NaN
| 1.29.2017 | 7
-----------------------------------
1001 | 1.1.2017 | 11
| 1.8.2017 | NaN
| 1.15.2017 | NaN
| 1.22.2017 | NaN
| 1.29.2017 | 12
Here NaN means that I have no data for these dates in the example. Now to get these results I am using the following code
import pandas as pd
df = pd.read_csv('data.csv', encoding = 'latin-1', sep=',')
df['Date'] = pd.to_datetime(df['Date'])
transaction_types = [101, 111]
s_df = df[df['ProductType'].isin(transaction_types)]
res_df = s_df.filter(['Date','ProductID','Qty']) # filter it because I do not want other product type column now
res_df = res_df.set_index('Date').groupby('ProductID').resample('W').sum()
res_df.to_csv('result.csv', sep=';', encoding='latin-1')
It returns me some wierd results. I am getting some dates which I don't even have in the data. I am showing results for only one ID
ProductID| Date |ProductID| Qty
1000 | 01/01/2017 | 4000 | 41
1000 | 08/01/2017 | |
1000 | 15/01/2017 | 33000 | 54
1000 | 22/01/2017 | 87000 | 313
1000 | 29/01/2017 | 79000 | 94
1000 | 05/02/2017 | 36000 | 413
1000 | 12/02/2017 | |
1000 | 19/02/2017 | |
1000 | 26/02/2017 | |
1000 | 05/03/2017 | 8000 | 78
The results are original and will not match the above example. But the productID are coming 2 times and I think it also summing the productIDs too. Also the sum is not correct. The dates are also going till march and in my data set I have dates of only January. Can someone guide me where are possible problems in my code ? Thanks
Upvotes: 0
Views: 908
Reputation: 61
I got stuck on this same problem and came across this answer. After looking through the pandas documentation I learnt a more flexible way to solve this is to allow pandas to infer the 'datetime' format as follow;
df['Date'] = pd.to_datetime(df['Date'], infer_datetime_format=True)
Upvotes: 1
Reputation: 5031
I was not giving any date format.For example
df['Date'] = pd.to_datetime(df['Date']) # Not correct
df['Date'] = pd.to_datetime(df['Date'], format='%d.%m.%Y') # Correct way
So because of this it was considering months as days and vice versa which was the reason I was getting wrong results.
Upvotes: 1