Niccola Tartaglia
Niccola Tartaglia

Reputation: 1667

Adding additional date rows to pandas dataframe

I have a dataframe that holds quarterly earnings dates for a number stocks, looking like this:

df = pd.DataFrame(  {
'ticker': ['AAPL', 'AAPL', 'AAPL','AAPL', 'MSFT', 'MSFT','MSFT', 'MSFT'], 
'datetime': ['2015-01-01', '2015-04-01', '2015-07-01', '2015-12-01', '2015-01-01', '2015-04-01', '2015-07-01', '2015-12-01'],
})
df['datetime'] = pd.to_datetime(df['datetime'])

I would like to create an additional n rows before and after each date, to make it look like this (for n=1):

df = pd.DataFrame(  {
'ticker': ['AAPL', 'AAPL', 'AAPL','AAPL','AAPL', 'AAPL', 'AAPL','AAPL','AAPL', 'AAPL', 'AAPL','AAPL', 'MSFT', 'MSFT','MSFT', 'MSFT','MSFT', 'MSFT','MSFT', 'MSFT','MSFT', 'MSFT','MSFT', 'MSFT'], 
'datetime': ['2014-12-31','2015-01-01','2015-01-02','2015-03-30','2015-04-01', '2015-04-02', '2015-06-30','2015-07-01','2015-07-02', '2015-11-30','2015-12-01', '2015-12-02',
'2014-12-31','2015-01-01','2015-01-02','2015-03-30','2015-04-01', '2015-04-02', '2015-06-30','2015-07-01','2015-07-02', '2015-11-30','2015-12-01', '2015-12-02'],
})
df['datetime'] = pd.to_datetime(df['datetime'])

Upvotes: 1

Views: 49

Answers (2)

Andrej Kesely
Andrej Kesely

Reputation: 195653

Substract and add to each date one day and then .explode():

t = pd.Timedelta(days=1)
df["datetime"] = df["datetime"].apply(lambda x: [x - t, x, x + t])
df = df.explode("datetime").reset_index(drop=True)
print(df)

Prints:

   ticker   datetime
0    AAPL 2014-12-31
1    AAPL 2015-01-01
2    AAPL 2015-01-02
3    AAPL 2015-03-31
4    AAPL 2015-04-01
5    AAPL 2015-04-02
6    AAPL 2015-06-30
7    AAPL 2015-07-01
8    AAPL 2015-07-02
9    AAPL 2015-11-30
10   AAPL 2015-12-01
11   AAPL 2015-12-02
12   MSFT 2014-12-31
13   MSFT 2015-01-01
14   MSFT 2015-01-02
15   MSFT 2015-03-31
16   MSFT 2015-04-01
17   MSFT 2015-04-02
18   MSFT 2015-06-30
19   MSFT 2015-07-01
20   MSFT 2015-07-02
21   MSFT 2015-11-30
22   MSFT 2015-12-01
23   MSFT 2015-12-02

EDIT: For n > 1:

n = 2
df["datetime"] = df["datetime"].apply(
    lambda x: [x + pd.Timedelta(days=d) for d in range(-n, n)]
)
df = df.explode("datetime").reset_index(drop=True)
print(df)

Prints:

   ticker   datetime
0    AAPL 2014-12-30
1    AAPL 2014-12-31
2    AAPL 2015-01-01
3    AAPL 2015-01-02
4    AAPL 2015-03-30
5    AAPL 2015-03-31

...

Upvotes: 2

Ben.T
Ben.T

Reputation: 29635

you can use DateOffset and concat like:

n = 1 # nb of rows to add
res = (pd.concat([df.assign(datetime = df['datetime'] + pd.DateOffset(days=i)) 
                  for i in range(-n, n+1)])
         .sort_values(['ticker','datetime'])
         .reset_index(drop=True)
)
print(res.head(10))
  ticker   datetime
0   AAPL 2014-12-31
1   AAPL 2015-01-01
2   AAPL 2015-01-02
3   AAPL 2015-03-31
4   AAPL 2015-04-01
5   AAPL 2015-04-02
6   AAPL 2015-06-30
7   AAPL 2015-07-01
8   AAPL 2015-07-02
9   AAPL 2015-11-30

Upvotes: 2

Related Questions