quant
quant

Reputation: 4482

How to transform weekly data to daily in pandas?

I have the following dataframe

import pandas as pd
foo = pd.DataFrame({'date':['2019-09-30', '2019-10-07', '2019-09-30', '2019-10-07'], 'sales': [7, 14, 28, 35], 'country': ['a', 'a', 'b', 'b']})

The date value changes weekly by country.

I would like to expand this dataframe, so that the date column changes daily by country and that the daily sales value for is the weekly sales value divided by 7

Upvotes: 1

Views: 1496

Answers (1)

jezrael
jezrael

Reputation: 862541

Use DataFrameGroupBy.resample with Resampler.ffill and divide values by 7, but also is necessary add last duplicated rows by country with added 6 days for avoid omit last days of last week per groups:

foo['date'] = pd.to_datetime(foo['date'])

mask = foo['country'].duplicated(keep='last')
foo1 = foo[~mask].assign(date = lambda x: x['date'] + pd.Timedelta(6, unit='d'))
foo = foo.append(foo1, ignore_index=True)
print (foo)
        date  sales country
0 2019-09-30      7       a
1 2019-10-07     14       a
2 2019-09-30     28       b
3 2019-10-07     35       b
4 2019-10-13     14       a
5 2019-10-13     35       b

If datetimes are not ordered per groups you can use this alternative:

foo1 = (foo.loc[foo.groupby('country')['date'].idxmax()]
            .assign(date = lambda x: x['date'] + pd.Timedelta(6, unit='d')))
foo = foo.append(foo1, ignore_index=True)
print (foo)
        date  sales country
0 2019-09-30      7       a
1 2019-10-07     14       a
2 2019-09-30     28       b
3 2019-10-07     35       b
4 2019-10-13     14       a
5 2019-10-13     35       b

df = (foo.set_index('date')
         .groupby('country')['sales']
         .resample('d')
         .ffill()
         .div(7)
         .reset_index()
         )

print (df)
   country       date  sales
0        a 2019-09-30    1.0
1        a 2019-10-01    1.0
2        a 2019-10-02    1.0
3        a 2019-10-03    1.0
4        a 2019-10-04    1.0
5        a 2019-10-05    1.0
6        a 2019-10-06    1.0
7        a 2019-10-07    2.0
8        a 2019-10-08    2.0
9        a 2019-10-09    2.0
10       a 2019-10-10    2.0
11       a 2019-10-11    2.0
12       a 2019-10-12    2.0
13       a 2019-10-13    2.0
14       b 2019-09-30    4.0
15       b 2019-10-01    4.0
16       b 2019-10-02    4.0
17       b 2019-10-03    4.0
18       b 2019-10-04    4.0
19       b 2019-10-05    4.0
20       b 2019-10-06    4.0
21       b 2019-10-07    5.0
22       b 2019-10-08    5.0
23       b 2019-10-09    5.0
24       b 2019-10-10    5.0
25       b 2019-10-11    5.0
26       b 2019-10-12    5.0
27       b 2019-10-13    5.0

If not encessary add last rows:

foo['date'] = pd.to_datetime(foo['date'])

df1 = (foo.set_index('date')
          .groupby('country')['sales']
          .resample('d')
          .ffill()
          .div(7)
          .reset_index()
         )
print (df1)
   country       date  sales
0        a 2019-09-30    1.0
1        a 2019-10-01    1.0
2        a 2019-10-02    1.0
3        a 2019-10-03    1.0
4        a 2019-10-04    1.0
5        a 2019-10-05    1.0
6        a 2019-10-06    1.0
7        a 2019-10-07    2.0
8        b 2019-09-30    4.0
9        b 2019-10-01    4.0
10       b 2019-10-02    4.0
11       b 2019-10-03    4.0
12       b 2019-10-04    4.0
13       b 2019-10-05    4.0
14       b 2019-10-06    4.0
15       b 2019-10-07    5.0

Upvotes: 5

Related Questions