Reputation: 859
I have data that looks like
color | date | sales
green | 2018-04-19 | 6
red | 2018-12-01 | 3
blue | 2018-09-14 | 7
. | . | .
. | . | .
Where all the dates are between 2018-01-01 and 2019-01-01 and there are many colors in the colors column that are repeated many times. For example Red could show up on various dates and Green could as well, and they can have sales on the same dates. There are several thousand records and a few other columns that aren't relevant to this query.
I have this in a dataframe and have sorted by the date
column in ascending order and reset the index. What I would like to do, however, is make a column for each week, where each row is a unique color and the sum of the color's sales during that week.
I thought something like df_sales['date'].resample('W').sum()
and then pivoting the data might do the trick, but that doesn't work. I know that in order to use resample
I need to set the index to the date
column via df_sales.set_index(pd.DatetimeIndex(df_sales['date']), inplace=True)
. But running print(df_shoes['date'].resample('W'))
just gives me DatetimeIndexResampler [freq=<Week: weekday=6>, axis=0, closed=right, label=right, convention=start, base=0]
which doesn't really help me.
It won't let me use the sum()
function at all.
Ideally, my results would look like (or something similar where each column is a week):
color | 2018-01-01 | 2018-01-08 | ... | 2018-12-24
green | 14 | 0 | ... | 6
indigo | 9 | 3 | ... | 18
Any methods that would help me achieve this would be great!
Thanks
Upvotes: 3
Views: 302
Reputation: 150735
OK, so now that I understood your question, can you try:
# toy data frame
np.random.seed(2019)
dates = pd.date_range('2018-01-01', '2018-1-21', freq='D')
colors = np.random.randint(0,3, len(dates))
values = np.random.normal(10,20, len(dates))
df = pd.DataFrame({'dates': dates,
'colors': colors,
'values': values})
df.groupby('colors').resample(rule='W', on='dates').values.sum().reset_index()
Output:
+---+--------+------------+------------+
| | colors | dates | values |
+---+--------+------------+------------+
| 0 | 0 | 2018-01-07 | 103.963813 |
| 1 | 0 | 2018-01-14 | 52.569041 |
| 2 | 0 | 2018-01-21 | 67.987307 |
| 3 | 1 | 2018-01-07 | 2.940078 |
| 4 | 1 | 2018-01-14 | -18.655026 |
| 5 | 1 | 2018-01-21 | -7.875686 |
| 6 | 2 | 2018-01-07 | 21.254167 |
| 7 | 2 | 2018-01-14 | 8.990712 |
| 8 | 2 | 2018-01-21 | -4.180851 |
+---+--------+------------+------------+
Upvotes: 1
Reputation: 1049
Let's try this one more time.
convert column date to a timestamp
df['date']=pd.to_datetime(df['date']).dt.date
then find locate beginning of week
df['BeginWeek']=df['date']-pd.DateOffset(weekday=0,weeks=1)
and then pivot over the result
df_pivot=df.pivot_table(index='color',columns='BeginWeek',values='sales',aggfunc='sum').fillna(0)
Upvotes: 1