Reputation: 4482
I have the following dataframe
import pandas as pd
foo = pd.DataFrame({'date':['2019-09-30', '2019-10-07', '2019-09-30', '2019-10-07'], 'sales': [7, 14, 28, 35], 'country': ['a', 'a', 'b', 'b']})
The date
value changes weekly by country
.
I would like to expand this dataframe, so that the date
column changes daily by country and that the daily sales
value for is the weekly sales
value divided by 7
Upvotes: 1
Views: 1496
Reputation: 862541
Use DataFrameGroupBy.resample
with Resampler.ffill
and divide values by 7
, but also is necessary add last duplicated rows by country with added 6 days for avoid omit last days of last week per groups:
foo['date'] = pd.to_datetime(foo['date'])
mask = foo['country'].duplicated(keep='last')
foo1 = foo[~mask].assign(date = lambda x: x['date'] + pd.Timedelta(6, unit='d'))
foo = foo.append(foo1, ignore_index=True)
print (foo)
date sales country
0 2019-09-30 7 a
1 2019-10-07 14 a
2 2019-09-30 28 b
3 2019-10-07 35 b
4 2019-10-13 14 a
5 2019-10-13 35 b
If datetimes are not ordered per groups you can use this alternative:
foo1 = (foo.loc[foo.groupby('country')['date'].idxmax()]
.assign(date = lambda x: x['date'] + pd.Timedelta(6, unit='d')))
foo = foo.append(foo1, ignore_index=True)
print (foo)
date sales country
0 2019-09-30 7 a
1 2019-10-07 14 a
2 2019-09-30 28 b
3 2019-10-07 35 b
4 2019-10-13 14 a
5 2019-10-13 35 b
df = (foo.set_index('date')
.groupby('country')['sales']
.resample('d')
.ffill()
.div(7)
.reset_index()
)
print (df)
country date sales
0 a 2019-09-30 1.0
1 a 2019-10-01 1.0
2 a 2019-10-02 1.0
3 a 2019-10-03 1.0
4 a 2019-10-04 1.0
5 a 2019-10-05 1.0
6 a 2019-10-06 1.0
7 a 2019-10-07 2.0
8 a 2019-10-08 2.0
9 a 2019-10-09 2.0
10 a 2019-10-10 2.0
11 a 2019-10-11 2.0
12 a 2019-10-12 2.0
13 a 2019-10-13 2.0
14 b 2019-09-30 4.0
15 b 2019-10-01 4.0
16 b 2019-10-02 4.0
17 b 2019-10-03 4.0
18 b 2019-10-04 4.0
19 b 2019-10-05 4.0
20 b 2019-10-06 4.0
21 b 2019-10-07 5.0
22 b 2019-10-08 5.0
23 b 2019-10-09 5.0
24 b 2019-10-10 5.0
25 b 2019-10-11 5.0
26 b 2019-10-12 5.0
27 b 2019-10-13 5.0
If not encessary add last rows:
foo['date'] = pd.to_datetime(foo['date'])
df1 = (foo.set_index('date')
.groupby('country')['sales']
.resample('d')
.ffill()
.div(7)
.reset_index()
)
print (df1)
country date sales
0 a 2019-09-30 1.0
1 a 2019-10-01 1.0
2 a 2019-10-02 1.0
3 a 2019-10-03 1.0
4 a 2019-10-04 1.0
5 a 2019-10-05 1.0
6 a 2019-10-06 1.0
7 a 2019-10-07 2.0
8 b 2019-09-30 4.0
9 b 2019-10-01 4.0
10 b 2019-10-02 4.0
11 b 2019-10-03 4.0
12 b 2019-10-04 4.0
13 b 2019-10-05 4.0
14 b 2019-10-06 4.0
15 b 2019-10-07 5.0
Upvotes: 5