J.D
J.D

Reputation: 1971

Python: Expand a dataframe row-wise based on datetime

I have a dataframe like this:

    ID       Date  Value
783  C 2018-02-23  0.704
580  B 2018-08-04 -1.189
221  A 2018-08-10 -0.788
228  A 2018-08-17  0.038
578  B 2018-08-02  1.188

What I want is expanding the dataframe based on Date column to 1-month earlier, and fill ID with the same person, and fill Value with nan until the last observation.

The expected result is similar to this:

    ID        Date  Value
0    C  2018/01/24    nan
1    C  2018/01/25    nan
2    C  2018/01/26    nan
3    C  2018/01/27    nan
4    C  2018/01/28    nan
5    C  2018/01/29    nan
6    C  2018/01/30    nan
7    C  2018/01/31    nan
8    C  2018/02/01    nan
9    C  2018/02/02    nan
10   C  2018/02/03    nan
11   C  2018/02/04    nan
12   C  2018/02/05    nan
13   C  2018/02/06    nan
14   C  2018/02/07    nan
15   C  2018/02/08    nan
16   C  2018/02/09    nan
17   C  2018/02/10    nan
18   C  2018/02/11    nan
19   C  2018/02/12    nan
20   C  2018/02/13    nan
21   C  2018/02/14    nan
22   C  2018/02/15    nan
23   C  2018/02/16    nan
24   C  2018/02/17    nan
25   C  2018/02/18    nan
26   C  2018/02/19    nan
27   C  2018/02/20    nan
28   C  2018/02/21    nan
29   C  2018/02/22    nan
30   C  2018/02/23  1.093
31   B  2018/07/05    nan
32   B  2018/07/06    nan
33   B  2018/07/07    nan
34   B  2018/07/08    nan
35   B  2018/07/09    nan
36   B  2018/07/10    nan
37   B  2018/07/11    nan
38   B  2018/07/12    nan
39   B  2018/07/13    nan
40   B  2018/07/14    nan
41   B  2018/07/15    nan
42   B  2018/07/16    nan
43   B  2018/07/17    nan
44   B  2018/07/18    nan
45   B  2018/07/19    nan
46   B  2018/07/20    nan
47   B  2018/07/21    nan
48   B  2018/07/22    nan
49   B  2018/07/23    nan
50   B  2018/07/24    nan
51   B  2018/07/25    nan
52   B  2018/07/26    nan
53   B  2018/07/27    nan
54   B  2018/07/28    nan
55   B  2018/07/29    nan
56   B  2018/07/30    nan
57   B  2018/07/31    nan
58   B  2018/08/01    nan
59   B  2018/08/02    nan
60   B  2018/08/03    nan
61   B  2018/08/04  0.764
62   A  2018/07/11    nan
63   A  2018/07/12    nan
64   A  2018/07/13    nan
65   A  2018/07/14    nan
66   A  2018/07/15    nan
67   A  2018/07/16    nan
68   A  2018/07/17    nan
69   A  2018/07/18    nan
70   A  2018/07/19    nan
71   A  2018/07/20    nan
72   A  2018/07/21    nan
73   A  2018/07/22    nan
74   A  2018/07/23    nan
75   A  2018/07/24    nan
76   A  2018/07/25    nan
77   A  2018/07/26    nan
78   A  2018/07/27    nan
79   A  2018/07/28    nan
80   A  2018/07/29    nan
81   A  2018/07/30    nan
82   A  2018/07/31    nan
83   A  2018/08/01    nan
84   A  2018/08/02    nan
85   A  2018/08/03    nan
86   A  2018/08/04    nan
87   A  2018/08/05    nan
88   A  2018/08/06    nan
89   A  2018/08/07    nan
90   A  2018/08/08    nan
91   A  2018/08/09    nan
92   A  2018/08/10  2.144
93   A  2018/07/18    nan
94   A  2018/07/19    nan
95   A  2018/07/20    nan
96   A  2018/07/21    nan
97   A  2018/07/22    nan
98   A  2018/07/23    nan
99   A  2018/07/24    nan
100  A  2018/07/25    nan
101  A  2018/07/26    nan
102  A  2018/07/27    nan
103  A  2018/07/28    nan
104  A  2018/07/29    nan
105  A  2018/07/30    nan
106  A  2018/07/31    nan
107  A  2018/08/01    nan
108  A  2018/08/02    nan
109  A  2018/08/03    nan
110  A  2018/08/04    nan
111  A  2018/08/05    nan
112  A  2018/08/06    nan
113  A  2018/08/07    nan
114  A  2018/08/08    nan
115  A  2018/08/09    nan
116  A  2018/08/10    nan
117  A  2018/08/11    nan
118  A  2018/08/12    nan
119  A  2018/08/13    nan
120  A  2018/08/14    nan
121  A  2018/08/15    nan
122  A  2018/08/16    nan
123  A  2018/08/17  0.644
124  B  2018/07/03    nan
125  B  2018/07/04    nan
126  B  2018/07/05    nan
127  B  2018/07/06    nan
128  B  2018/07/07    nan
129  B  2018/07/08    nan
130  B  2018/07/09    nan
131  B  2018/07/10    nan
132  B  2018/07/11    nan
133  B  2018/07/12    nan
134  B  2018/07/13    nan
135  B  2018/07/14    nan
136  B  2018/07/15    nan
137  B  2018/07/16    nan
138  B  2018/07/17    nan
139  B  2018/07/18    nan
140  B  2018/07/19    nan
141  B  2018/07/20    nan
142  B  2018/07/21    nan
143  B  2018/07/22    nan
144  B  2018/07/23    nan
145  B  2018/07/24    nan
146  B  2018/07/25    nan
147  B  2018/07/26    nan
148  B  2018/07/27    nan
149  B  2018/07/28    nan
150  B  2018/07/29    nan
151  B  2018/07/30    nan
152  B  2018/07/31    nan
153  B  2018/08/01    nan
154  B  2018/08/02 -0.767

The source data can be created as below:

import pandas as pd
from itertools import chain
import numpy as np


df_1 = pd.DataFrame({
        'ID' : list(chain.from_iterable([['A'] * 365, ['B'] * 365, ['C'] * 365])),
        'Date' : pd.date_range(start = '2018-01-01', end = '2018-12-31').tolist() + pd.date_range(start = '2018-01-01', end = '2018-12-31').tolist() + pd.date_range(start = '2018-01-01', end = '2018-12-31').tolist(),
        'Value' : np.random.randn(365 * 3)
        })

df_1 = df_1.sample(5, random_state = 123)

Thanks for the advice!

Upvotes: 3

Views: 282

Answers (2)

kantal
kantal

Reputation: 2407

We can create a date range in the "Date" column, then explode it. Then group the "Value" column by the index and set values to nan but the last. Finally reset the index.

def drange(t):
    return pd.date_range( t-pd.DateOffset(months=1)+pd.DateOffset(days=1),t,freq="D",normalize=True)

df["Date"]= df["Date"].transform(drange)

      ID                                               Date  Value
index                                                             
783    C  DatetimeIndex(['2018-01-24', '2018-01-25', '20...  0.704
580    B  DatetimeIndex(['2018-07-05', '2018-07-06', '20... -1.189
221    A  DatetimeIndex(['2018-07-11', '2018-07-12', '20... -0.788
228    A  DatetimeIndex(['2018-07-18', '2018-07-19', '20...  0.038
578    B  DatetimeIndex(['2018-07-03', '2018-07-04', '20...  1.188


df= df.reset_index(drop=True).explode(column="Date")

   ID       Date  Value
0   C 2018-01-24  0.704
0   C 2018-01-25  0.704
0   C 2018-01-26  0.704
0   C 2018-01-27  0.704
0   C 2018-01-28  0.704
.. ..        ...    ...
4   B 2018-07-29  1.188
4   B 2018-07-30  1.188
4   B 2018-07-31  1.188
4   B 2018-08-01  1.188
4   B 2018-08-02  1.188


df["Value"]= df.groupby(level=0)["Value"].transform(lambda v: [np.nan]*(len(v)-1)+[v.iloc[0]])
df= df.reset_index(drop=True)

    ID       Date  Value
0    C 2018-01-24    NaN
1    C 2018-01-25    NaN
2    C 2018-01-26    NaN
3    C 2018-01-27    NaN
4    C 2018-01-28    NaN
..  ..        ...    ...
150  B 2018-07-29    NaN
151  B 2018-07-30    NaN
152  B 2018-07-31    NaN
153  B 2018-08-01    NaN
154  B 2018-08-02  1.188

Upvotes: 1

jezrael
jezrael

Reputation: 862641

You can create another DataFrame with previous months, then join together by concat, create DatetimeIndex, so possible use groupby with resample by d for days for add all values between:

df_2 = df_1.assign(Date = df_1['Date'] - pd.DateOffset(months=1) + pd.DateOffset(days=1),
                   Value = np.nan)

df = (pd.concat([df_2, df_1], sort=False)
        .reset_index()
        .set_index('Date')
        .groupby('index', sort=False)
        .resample('D')
        .ffill()
        .reset_index(level=1)
        .drop('index', 1)
        .rename_axis(None))
print (df)

          Date ID     Value
783 2018-01-24  C       NaN
783 2018-01-25  C       NaN
783 2018-01-26  C       NaN
783 2018-01-27  C       NaN
783 2018-01-28  C       NaN
..         ... ..       ...
578 2018-07-29  B       NaN
578 2018-07-30  B       NaN
578 2018-07-31  B       NaN
578 2018-08-01  B       NaN
578 2018-08-02  B  0.562684

[155 rows x 3 columns]

Another solution with list comprehension and concat, but last is necessary back filling of columns for index and ID, solution working if no missing value in original ID column:

offset = pd.DateOffset(months=1) + pd.DateOffset(days=1)
df=pd.concat([df_1.iloc[[i]].reset_index().set_index('Date').reindex(pd.date_range(d-offset,d))
                      for i, d in enumerate(df_1['Date'])], sort=False)
df = (df.assign(index = df['index'].bfill().astype(int), ID = df['ID'].bfill())
        .rename_axis('Date')
        .reset_index()
        .set_index('index')
        .rename_axis(None)
)
print (df)

          Date ID     Value
783 2018-01-24  C       NaN
783 2018-01-25  C       NaN
783 2018-01-26  C       NaN
783 2018-01-27  C       NaN
783 2018-01-28  C       NaN
..         ... ..       ...
578 2018-07-29  B       NaN
578 2018-07-30  B       NaN
578 2018-07-31  B       NaN
578 2018-08-01  B       NaN
578 2018-08-02  B  1.224345

[155 rows x 3 columns]

Upvotes: 3

Related Questions