Umang Garg
Umang Garg

Reputation: 115

how to shift a time series data by a month in python

I tried using the DataFrame.shift() function with freq = 'M' but when i offset by 1 month, the date get offset to the end of the month and not to the same date of the next month.

Is there any way I can offset by a exactly 1 month. i.e. if I have a time series dataframe and the 1st index value is 23rd August, after shifting for a month, I want the the value of index at 23rd Sept to come in front of value of index at 23rd August.

Please suggest a way to do that. It'll save a lot of time as otherwise, I'll have to use a loop.

I want to create a new column in this dataframe such that the value in the new column corresponding to the index 20-10-01 10:00:00 and ticker AAPL should be the value of the column 'c' at time 20-11-01 10:00:00 and ticker AAPL. and so on for the other rows. Example data:

Timestamp('2019-10-01 10:00:00+0000', tz='UTC'): 56.5675,
Timestamp('2019-10-01 16:00:00+0000', tz='UTC'): 56.2725,
Timestamp('2019-10-01 22:00:00+0000', tz='UTC'): 56.2925,
Timestamp('2019-10-02 04:00:00+0000', tz='UTC'): 55.6525,
Timestamp('2019-10-02 10:00:00+0000', tz='UTC'): 54.8025,
Timestamp('2019-10-02 16:00:00+0000', tz='UTC'): 54.625,
Timestamp('2019-10-02 22:00:00+0000', tz='UTC'): 54.625,
Timestamp('2019-10-03 04:00:00+0000', tz='UTC'): 54.825,
Timestamp('2019-10-03 10:00:00+0000', tz='UTC'): 54.7075,
Timestamp('2019-10-03 16:00:00+0000', tz='UTC'): 55.1575,
Timestamp('2019-10-03 22:00:00+0000', tz='UTC'): 55.125,
Timestamp('2019-10-04 04:00:00+0000', tz='UTC'): 55.88,
Timestamp('2019-10-04 10:00:00+0000', tz='UTC'): 56.51,
Timestamp('2019-10-04 16:00:00+0000', tz='UTC'): 56.77,
Timestamp('2019-10-04 22:00:00+0000', tz='UTC'): 56.7375,
Timestamp('2019-10-07 04:00:00+0000', tz='UTC'): 56.5,
Timestamp('2019-10-07 10:00:00+0000', tz='UTC'): 57.3525,
Timestamp('2019-10-07 16:00:00+0000', tz='UTC'): 56.7875,
Timestamp('2019-10-07 22:00:00+0000', tz='UTC'): 56.86,
Timestamp('2019-10-08 04:00:00+0000', tz='UTC'): 56.75,
Timestamp('2019-10-08 10:00:00+0000', tz='UTC'): 56.525,
Timestamp('2019-10-08 16:00:00+0000', tz='UTC'): 55.9775,
Timestamp('2019-10-08 22:00:00+0000', tz='UTC'): 55.925,
Timestamp('2019-10-09 04:00:00+0000', tz='UTC'): 56.75,
Timestamp('2019-10-09 10:00:00+0000', tz='UTC'): 56.6783,
Timestamp('2019-10-09 16:00:00+0000', tz='UTC'): 56.77,
Timestamp('2019-10-09 22:00:00+0000', tz='UTC'): 56.075,
Timestamp('2019-10-10 04:00:00+0000', tz='UTC'): 56.875,
Timestamp('2019-10-10 10:00:00+0000', tz='UTC'): 57.5175,
Timestamp('2019-10-10 16:00:00+0000', tz='UTC'): 57.71,
Timestamp('2019-10-10 22:00:00+0000', tz='UTC'): 57.8125,
Timestamp('2019-10-11 04:00:00+0000', tz='UTC'): 58.235,
Timestamp('2019-10-11 10:00:00+0000', tz='UTC'): 58.62,
Timestamp('2019-10-11 16:00:00+0000', tz='UTC'): 59.1825,
Timestamp('2019-10-11 22:00:00+0000', tz='UTC'): 59.3125,
Timestamp('2019-10-14 04:00:00+0000', tz='UTC'): 58.5925,
Timestamp('2019-10-14 10:00:00+0000', tz='UTC'): 59.25,
Timestamp('2019-10-14 16:00:00+0000', tz='UTC'): 58.975,
Timestamp('2019-10-14 22:00:00+0000', tz='UTC'): 59.1125,
Timestamp('2019-10-15 04:00:00+0000', tz='UTC'): 59.2525,
Timestamp('2019-10-15 10:00:00+0000', tz='UTC'): 58.9238,
Timestamp('2019-10-15 16:00:00+0000', tz='UTC'): 58.9,
Timestamp('2019-10-15 22:00:00+0000', tz='UTC'): 58.75,
Timestamp('2019-10-16 04:00:00+0000', tz='UTC'): 58.565,
Timestamp('2019-10-16 10:00:00+0000', tz='UTC'): 58.59,
Timestamp('2019-10-16 16:00:00+0000', tz='UTC'): 58.6825,
Timestamp('2019-10-16 22:00:00+0000', tz='UTC'): 58.5875,
Timestamp('2019-10-17 04:00:00+0000', tz='UTC'): 58.9375,
Timestamp('2019-10-17 10:00:00+0000', tz='UTC'): 58.48,
Timestamp('2019-10-17 16:00:00+0000', tz='UTC'): 58.8375,
Timestamp('2019-10-17 22:00:00+0000', tz='UTC'): 58.8025,
Timestamp('2019-10-18 04:00:00+0000', tz='UTC'): 58.7275,
Timestamp('2019-10-18 10:00:00+0000', tz='UTC'): 58.7838,
Timestamp('2019-10-18 16:00:00+0000', tz='UTC'): 59.0675,
Timestamp('2019-10-18 22:00:00+0000', tz='UTC'): 59.0525,
Timestamp('2019-10-21 04:00:00+0000', tz='UTC'): 59.3775,
Timestamp('2019-10-21 10:00:00+0000', tz='UTC'): 60.1825,
Timestamp('2019-10-21 16:00:00+0000', tz='UTC'): 60.165,
Timestamp('2019-10-21 22:00:00+0000', tz='UTC'): 60.1725,
Timestamp('2019-10-22 04:00:00+0000', tz='UTC'): 60.1975,
Timestamp('2019-10-22 10:00:00+0000', tz='UTC'): 60.2975,
Timestamp('2019-10-22 16:00:00+0000', tz='UTC'): 59.8025,
Timestamp('2019-10-22 22:00:00+0000', tz='UTC'): 59.755,
Timestamp('2019-10-23 04:00:00+0000', tz='UTC'): 60.3975,
Timestamp('2019-10-23 10:00:00+0000', tz='UTC'): 60.6265,
Timestamp('2019-10-23 16:00:00+0000', tz='UTC'): 60.8875,
Timestamp('2019-10-23 22:00:00+0000', tz='UTC'): 61.0275,
Timestamp('2019-10-24 04:00:00+0000', tz='UTC'): 61.0525,
Timestamp('2019-10-24 10:00:00+0000', tz='UTC'): 60.82,
Timestamp('2019-10-24 16:00:00+0000', tz='UTC'): 60.8125,
Timestamp('2019-10-24 22:00:00+0000', tz='UTC'): 60.8225,
Timestamp('2019-10-25 04:00:00+0000', tz='UTC'): 60.75,
Timestamp('2019-10-25 10:00:00+0000', tz='UTC'): 61.3425,
Timestamp('2019-10-25 16:00:00+0000', tz='UTC'): 61.7,
Timestamp('2019-10-25 22:00:00+0000', tz='UTC'): 61.6875,
Timestamp('2019-10-28 04:00:00+0000', tz='UTC'): 61.8575,
Timestamp('2019-10-28 10:00:00+0000', tz='UTC'): 62.1388,
Timestamp('2019-10-28 16:00:00+0000', tz='UTC'): 62.285,
Timestamp('2019-10-28 22:00:00+0000', tz='UTC'): 62.2875,
Timestamp('2019-10-29 04:00:00+0000', tz='UTC'): 62.15,
Timestamp('2019-10-29 10:00:00+0000', tz='UTC'): 60.7952,
Timestamp('2019-10-29 16:00:00+0000', tz='UTC'): 60.9525,
Timestamp('2019-10-29 22:00:00+0000', tz='UTC'): 60.9575,
Timestamp('2019-10-30 04:00:00+0000', tz='UTC'): 60.9575,
Timestamp('2019-10-30 10:00:00+0000', tz='UTC'): 60.5125,
Timestamp('2019-10-30 16:00:00+0000', tz='UTC'): 62.05,
Timestamp('2019-10-30 22:00:00+0000', tz='UTC'): 62.0475,
Timestamp('2019-10-31 04:00:00+0000', tz='UTC'): 61.76,
Timestamp('2019-10-31 10:00:00+0000', tz='UTC'): 62.0523,
Timestamp('2019-10-31 16:00:00+0000', tz='UTC'): 62.105,
Timestamp('2019-10-31 22:00:00+0000', tz='UTC'): 62.14,
Timestamp('2019-11-01 04:00:00+0000', tz='UTC'): 62.35,
Timestamp('2019-11-01 10:00:00+0000', tz='UTC'): 63.3099,
Timestamp('2019-11-01 16:00:00+0000', tz='UTC'): 63.9725,
Timestamp('2019-11-01 22:00:00+0000', tz='UTC'): 64.025,
Timestamp('2019-11-04 10:00:00+0000', tz='UTC'): 64.2388,
Timestamp('2019-11-04 16:00:00+0000', tz='UTC'): 64.375,
Timestamp('2019-11-04 22:00:00+0000', tz='UTC'): 64.4975,
Timestamp('2019-11-05 04:00:00+0000', tz='UTC'): 64.575}}

this is the dataset and the expected new column is: 62.35 63.3099, 63.9725, 64.025 and so on I want the values of 1 month ahead but using df['new_column'] = df.shift(1, freq = 'M')['c'] doesn't do the job

Upvotes: 3

Views: 2674

Answers (2)

k33da_the_bug
k33da_the_bug

Reputation: 820

Assuming you have unique timestamps per day and don't have any missing timestamp value. Following might work, as you just have to shift rows based on days and no of unique time stamps per day.

import pandas as pd

# Dummy data
# I assumed you have 4 unique values for a day's timestamp and don't have any missing values
lst1 = list(pd.date_range('2020-08-01 04:00:00', periods=60))
lst2 = list(pd.date_range('2020-08-01 10:00:00', periods=60))
lst3 = list(pd.date_range('2020-08-01 16:00:00', periods=60))
lst4 = list(pd.date_range('2020-08-01 22:00:00', periods=60))

lst1.extend(lst2)
lst1.extend(lst3)
lst1.extend(lst4)

data = {
    'date': lst1,
    'value': [v for v in range(0,240)]
}

# Preprocessing
df = pd.DataFrame(data)
df = df.sort_values(by=['date'])
df.reset_index(drop=True, inplace=True)

def update(row,df):
  row['value'] = df.loc[row.name]['value']
  return row

# factor is = X days of shift * Y unique time stamps per day 
factor = 31 * 4 
df.apply(update,axis=1,args=[df.shift(-factor)])

enter image description here

Upvotes: 1

David Erickson
David Erickson

Reputation: 16683

The problem is pretty straightforward but you need to some specific things with your dates in order to get n

  1. Find the number of rows that you need to shift by, which I call n and use pd.DateOffset(months=1)
  2. You need to shift upward with - by n rows

df['Date'] = pd.to_datetime(pd.to_datetime(df['Date']).dt.date)
n = df[df['Date'].isin(pd.to_datetime(df['Date'] + pd.DateOffset(months=1)))].index[0]
df['new_column'] = df['c'].shift(-n)
df
Out[1]: 
         Date       c  new_column
0  2019-10-01 56.5675       62.35
1  2019-10-01 56.2725     63.3099
2  2019-10-01 56.2925     63.9725
3  2019-10-02 55.6525      64.025
4  2019-10-02 54.8025     64.2388
..        ...     ...         ...
94 2019-11-01  64.025         nan
95 2019-11-04 64.2388         nan
96 2019-11-04  64.375         nan
97 2019-11-04 64.4975         nan
98 2019-11-05  64.575         nan

Please note that to get the above output, I used:

df = pd.DataFrame(
    {pd.Timestamp('2019-10-01 10:00:00+0000', tz='UTC'): 56.5675,
    pd.Timestamp('2019-10-01 16:00:00+0000', tz='UTC'): 56.2725,
    pd.Timestamp('2019-10-01 22:00:00+0000', tz='UTC'): 56.2925,
    pd.Timestamp('2019-10-02 04:00:00+0000', tz='UTC'): 55.6525,
    pd.Timestamp('2019-10-02 10:00:00+0000', tz='UTC'): 54.8025,
    pd.Timestamp('2019-10-02 16:00:00+0000', tz='UTC'): 54.625,
    pd.Timestamp('2019-10-02 22:00:00+0000', tz='UTC'): 54.625,
    pd.Timestamp('2019-10-03 04:00:00+0000', tz='UTC'): 54.825,
    pd.Timestamp('2019-10-03 10:00:00+0000', tz='UTC'): 54.7075,
    pd.Timestamp('2019-10-03 16:00:00+0000', tz='UTC'): 55.1575,
    pd.Timestamp('2019-10-03 22:00:00+0000', tz='UTC'): 55.125,
    pd.Timestamp('2019-10-04 04:00:00+0000', tz='UTC'): 55.88,
    pd.Timestamp('2019-10-04 10:00:00+0000', tz='UTC'): 56.51,
    pd.Timestamp('2019-10-04 16:00:00+0000', tz='UTC'): 56.77,
    pd.Timestamp('2019-10-04 22:00:00+0000', tz='UTC'): 56.7375,
    pd.Timestamp('2019-10-07 04:00:00+0000', tz='UTC'): 56.5,
    pd.Timestamp('2019-10-07 10:00:00+0000', tz='UTC'): 57.3525,
    pd.Timestamp('2019-10-07 16:00:00+0000', tz='UTC'): 56.7875,
    pd.Timestamp('2019-10-07 22:00:00+0000', tz='UTC'): 56.86,
    pd.Timestamp('2019-10-08 04:00:00+0000', tz='UTC'): 56.75,
    pd.Timestamp('2019-10-08 10:00:00+0000', tz='UTC'): 56.525,
    pd.Timestamp('2019-10-08 16:00:00+0000', tz='UTC'): 55.9775,
    pd.Timestamp('2019-10-08 22:00:00+0000', tz='UTC'): 55.925,
    pd.Timestamp('2019-10-09 04:00:00+0000', tz='UTC'): 56.75,
    pd.Timestamp('2019-10-09 10:00:00+0000', tz='UTC'): 56.6783,
    pd.Timestamp('2019-10-09 16:00:00+0000', tz='UTC'): 56.77,
    pd.Timestamp('2019-10-09 22:00:00+0000', tz='UTC'): 56.075,
    pd.Timestamp('2019-10-10 04:00:00+0000', tz='UTC'): 56.875,
    pd.Timestamp('2019-10-10 10:00:00+0000', tz='UTC'): 57.5175,
    pd.Timestamp('2019-10-10 16:00:00+0000', tz='UTC'): 57.71,
    pd.Timestamp('2019-10-10 22:00:00+0000', tz='UTC'): 57.8125,
    pd.Timestamp('2019-10-11 04:00:00+0000', tz='UTC'): 58.235,
    pd.Timestamp('2019-10-11 10:00:00+0000', tz='UTC'): 58.62,
    pd.Timestamp('2019-10-11 16:00:00+0000', tz='UTC'): 59.1825,
    pd.Timestamp('2019-10-11 22:00:00+0000', tz='UTC'): 59.3125,
    pd.Timestamp('2019-10-14 04:00:00+0000', tz='UTC'): 58.5925,
    pd.Timestamp('2019-10-14 10:00:00+0000', tz='UTC'): 59.25,
    pd.Timestamp('2019-10-14 16:00:00+0000', tz='UTC'): 58.975,
    pd.Timestamp('2019-10-14 22:00:00+0000', tz='UTC'): 59.1125,
    pd.Timestamp('2019-10-15 04:00:00+0000', tz='UTC'): 59.2525,
    pd.Timestamp('2019-10-15 10:00:00+0000', tz='UTC'): 58.9238,
    pd.Timestamp('2019-10-15 16:00:00+0000', tz='UTC'): 58.9,
    pd.Timestamp('2019-10-15 22:00:00+0000', tz='UTC'): 58.75,
    pd.Timestamp('2019-10-16 04:00:00+0000', tz='UTC'): 58.565,
    pd.Timestamp('2019-10-16 10:00:00+0000', tz='UTC'): 58.59,
    pd.Timestamp('2019-10-16 16:00:00+0000', tz='UTC'): 58.6825,
    pd.Timestamp('2019-10-16 22:00:00+0000', tz='UTC'): 58.5875,
    pd.Timestamp('2019-10-17 04:00:00+0000', tz='UTC'): 58.9375,
    pd.Timestamp('2019-10-17 10:00:00+0000', tz='UTC'): 58.48,
    pd.Timestamp('2019-10-17 16:00:00+0000', tz='UTC'): 58.8375,
    pd.Timestamp('2019-10-17 22:00:00+0000', tz='UTC'): 58.8025,
    pd.Timestamp('2019-10-18 04:00:00+0000', tz='UTC'): 58.7275,
    pd.Timestamp('2019-10-18 10:00:00+0000', tz='UTC'): 58.7838,
    pd.Timestamp('2019-10-18 16:00:00+0000', tz='UTC'): 59.0675,
    pd.Timestamp('2019-10-18 22:00:00+0000', tz='UTC'): 59.0525,
    pd.Timestamp('2019-10-21 04:00:00+0000', tz='UTC'): 59.3775,
    pd.Timestamp('2019-10-21 10:00:00+0000', tz='UTC'): 60.1825,
    pd.Timestamp('2019-10-21 16:00:00+0000', tz='UTC'): 60.165,
    pd.Timestamp('2019-10-21 22:00:00+0000', tz='UTC'): 60.1725,
    pd.Timestamp('2019-10-22 04:00:00+0000', tz='UTC'): 60.1975,
    pd.Timestamp('2019-10-22 10:00:00+0000', tz='UTC'): 60.2975,
    pd.Timestamp('2019-10-22 16:00:00+0000', tz='UTC'): 59.8025,
    pd.Timestamp('2019-10-22 22:00:00+0000', tz='UTC'): 59.755,
    pd.Timestamp('2019-10-23 04:00:00+0000', tz='UTC'): 60.3975,
    pd.Timestamp('2019-10-23 10:00:00+0000', tz='UTC'): 60.6265,
    pd.Timestamp('2019-10-23 16:00:00+0000', tz='UTC'): 60.8875,
    pd.Timestamp('2019-10-23 22:00:00+0000', tz='UTC'): 61.0275,
    pd.Timestamp('2019-10-24 04:00:00+0000', tz='UTC'): 61.0525,
    pd.Timestamp('2019-10-24 10:00:00+0000', tz='UTC'): 60.82,
    pd.Timestamp('2019-10-24 16:00:00+0000', tz='UTC'): 60.8125,
    pd.Timestamp('2019-10-24 22:00:00+0000', tz='UTC'): 60.8225,
    pd.Timestamp('2019-10-25 04:00:00+0000', tz='UTC'): 60.75,
    pd.Timestamp('2019-10-25 10:00:00+0000', tz='UTC'): 61.3425,
    pd.Timestamp('2019-10-25 16:00:00+0000', tz='UTC'): 61.7,
    pd.Timestamp('2019-10-25 22:00:00+0000', tz='UTC'): 61.6875,
    pd.Timestamp('2019-10-28 04:00:00+0000', tz='UTC'): 61.8575,
    pd.Timestamp('2019-10-28 10:00:00+0000', tz='UTC'): 62.1388,
    pd.Timestamp('2019-10-28 16:00:00+0000', tz='UTC'): 62.285,
    pd.Timestamp('2019-10-28 22:00:00+0000', tz='UTC'): 62.2875,
    pd.Timestamp('2019-10-29 04:00:00+0000', tz='UTC'): 62.15,
    pd.Timestamp('2019-10-29 10:00:00+0000', tz='UTC'): 60.7952,
    pd.Timestamp('2019-10-29 16:00:00+0000', tz='UTC'): 60.9525,
    pd.Timestamp('2019-10-29 22:00:00+0000', tz='UTC'): 60.9575,
    pd.Timestamp('2019-10-30 04:00:00+0000', tz='UTC'): 60.9575,
    pd.Timestamp('2019-10-30 10:00:00+0000', tz='UTC'): 60.5125,
    pd.Timestamp('2019-10-30 16:00:00+0000', tz='UTC'): 62.05,
    pd.Timestamp('2019-10-30 22:00:00+0000', tz='UTC'): 62.0475,
    pd.Timestamp('2019-10-31 04:00:00+0000', tz='UTC'): 61.76,
    pd.Timestamp('2019-10-31 10:00:00+0000', tz='UTC'): 62.0523,
    pd.Timestamp('2019-10-31 16:00:00+0000', tz='UTC'): 62.105,
    pd.Timestamp('2019-10-31 22:00:00+0000', tz='UTC'): 62.14,
    pd.Timestamp('2019-11-01 04:00:00+0000', tz='UTC'): 62.35,
    pd.Timestamp('2019-11-01 10:00:00+0000', tz='UTC'): 63.3099,
    pd.Timestamp('2019-11-01 16:00:00+0000', tz='UTC'): 63.9725,
    pd.Timestamp('2019-11-01 22:00:00+0000', tz='UTC'): 64.025,
    pd.Timestamp('2019-11-04 10:00:00+0000', tz='UTC'): 64.2388,
    pd.Timestamp('2019-11-04 16:00:00+0000', tz='UTC'): 64.375,
    pd.Timestamp('2019-11-04 22:00:00+0000', tz='UTC'): 64.4975,
    pd.Timestamp('2019-11-05 04:00:00+0000', tz='UTC'): 64.575}, index=['c']).T
df = df.reset_index().rename({'index': 'Date'}, axis=1)

# and then my answer:

df['Date'] = pd.to_datetime(pd.to_datetime(df['Date']).dt.date)
n = df[df['Date'].isin(pd.to_datetime(df['Date'] + 
pd.DateOffset(months=1)))].index[0]
df['new_column'] = df['c'].shift(-n)
df

Upvotes: 1

Related Questions