DrakeMurdoch
DrakeMurdoch

Reputation: 859

Turn column with datetime values into a column for each week

I have data that looks like

color  |  date      | sales
green  | 2018-04-19 |   6
red    | 2018-12-01 |   3
blue   | 2018-09-14 |   7
 .     |     .      |   .
 .     |     .      |   .

Where all the dates are between 2018-01-01 and 2019-01-01 and there are many colors in the colors column that are repeated many times. For example Red could show up on various dates and Green could as well, and they can have sales on the same dates. There are several thousand records and a few other columns that aren't relevant to this query.

I have this in a dataframe and have sorted by the date column in ascending order and reset the index. What I would like to do, however, is make a column for each week, where each row is a unique color and the sum of the color's sales during that week.

I thought something like df_sales['date'].resample('W').sum() and then pivoting the data might do the trick, but that doesn't work. I know that in order to use resample I need to set the index to the date column via df_sales.set_index(pd.DatetimeIndex(df_sales['date']), inplace=True). But running print(df_shoes['date'].resample('W')) just gives me DatetimeIndexResampler [freq=<Week: weekday=6>, axis=0, closed=right, label=right, convention=start, base=0] which doesn't really help me.

It won't let me use the sum() function at all.

Ideally, my results would look like (or something similar where each column is a week):

color  |  2018-01-01  | 2018-01-08  | ... | 2018-12-24
green  |      14      |      0      | ... |     6
indigo |      9       |      3      | ... |     18

Any methods that would help me achieve this would be great!

Thanks

Upvotes: 3

Views: 302

Answers (2)

Quang Hoang
Quang Hoang

Reputation: 150735

OK, so now that I understood your question, can you try:

# toy data frame
np.random.seed(2019)
dates = pd.date_range('2018-01-01', '2018-1-21', freq='D')
colors = np.random.randint(0,3, len(dates))
values = np.random.normal(10,20, len(dates))

df = pd.DataFrame({'dates': dates,
                   'colors': colors,
                   'values': values})    

df.groupby('colors').resample(rule='W', on='dates').values.sum().reset_index()

Output:

+---+--------+------------+------------+
|   | colors |   dates    |   values   |
+---+--------+------------+------------+
| 0 |      0 | 2018-01-07 | 103.963813 |
| 1 |      0 | 2018-01-14 | 52.569041  |
| 2 |      0 | 2018-01-21 | 67.987307  |
| 3 |      1 | 2018-01-07 | 2.940078   |
| 4 |      1 | 2018-01-14 | -18.655026 |
| 5 |      1 | 2018-01-21 | -7.875686  |
| 6 |      2 | 2018-01-07 | 21.254167  |
| 7 |      2 | 2018-01-14 | 8.990712   |
| 8 |      2 | 2018-01-21 | -4.180851  |
+---+--------+------------+------------+

Upvotes: 1

Nev1111
Nev1111

Reputation: 1049

Let's try this one more time.

convert column date to a timestamp

df['date']=pd.to_datetime(df['date']).dt.date

then find locate beginning of week

 df['BeginWeek']=df['date']-pd.DateOffset(weekday=0,weeks=1)

and then pivot over the result

df_pivot=df.pivot_table(index='color',columns='BeginWeek',values='sales',aggfunc='sum').fillna(0)

Upvotes: 1

Related Questions