Reputation: 115
I tried using the DataFrame.shift()
function with freq = 'M'
but when i offset by 1 month, the date get offset to the end of the month and not to the same date of the next month.
Is there any way I can offset by a exactly 1 month. i.e. if I have a time series dataframe and the 1st index value is 23rd August
, after shifting for a month, I want the the value of index at 23rd Sept
to come in front of value of index at 23rd August
.
Please suggest a way to do that. It'll save a lot of time as otherwise, I'll have to use a loop.
I want to create a new column in this dataframe such that the value in the new column corresponding to the index 20-10-01 10:00:00 and ticker AAPL should be the value of the column 'c' at time 20-11-01 10:00:00 and ticker AAPL. and so on for the other rows. Example data:
Timestamp('2019-10-01 10:00:00+0000', tz='UTC'): 56.5675,
Timestamp('2019-10-01 16:00:00+0000', tz='UTC'): 56.2725,
Timestamp('2019-10-01 22:00:00+0000', tz='UTC'): 56.2925,
Timestamp('2019-10-02 04:00:00+0000', tz='UTC'): 55.6525,
Timestamp('2019-10-02 10:00:00+0000', tz='UTC'): 54.8025,
Timestamp('2019-10-02 16:00:00+0000', tz='UTC'): 54.625,
Timestamp('2019-10-02 22:00:00+0000', tz='UTC'): 54.625,
Timestamp('2019-10-03 04:00:00+0000', tz='UTC'): 54.825,
Timestamp('2019-10-03 10:00:00+0000', tz='UTC'): 54.7075,
Timestamp('2019-10-03 16:00:00+0000', tz='UTC'): 55.1575,
Timestamp('2019-10-03 22:00:00+0000', tz='UTC'): 55.125,
Timestamp('2019-10-04 04:00:00+0000', tz='UTC'): 55.88,
Timestamp('2019-10-04 10:00:00+0000', tz='UTC'): 56.51,
Timestamp('2019-10-04 16:00:00+0000', tz='UTC'): 56.77,
Timestamp('2019-10-04 22:00:00+0000', tz='UTC'): 56.7375,
Timestamp('2019-10-07 04:00:00+0000', tz='UTC'): 56.5,
Timestamp('2019-10-07 10:00:00+0000', tz='UTC'): 57.3525,
Timestamp('2019-10-07 16:00:00+0000', tz='UTC'): 56.7875,
Timestamp('2019-10-07 22:00:00+0000', tz='UTC'): 56.86,
Timestamp('2019-10-08 04:00:00+0000', tz='UTC'): 56.75,
Timestamp('2019-10-08 10:00:00+0000', tz='UTC'): 56.525,
Timestamp('2019-10-08 16:00:00+0000', tz='UTC'): 55.9775,
Timestamp('2019-10-08 22:00:00+0000', tz='UTC'): 55.925,
Timestamp('2019-10-09 04:00:00+0000', tz='UTC'): 56.75,
Timestamp('2019-10-09 10:00:00+0000', tz='UTC'): 56.6783,
Timestamp('2019-10-09 16:00:00+0000', tz='UTC'): 56.77,
Timestamp('2019-10-09 22:00:00+0000', tz='UTC'): 56.075,
Timestamp('2019-10-10 04:00:00+0000', tz='UTC'): 56.875,
Timestamp('2019-10-10 10:00:00+0000', tz='UTC'): 57.5175,
Timestamp('2019-10-10 16:00:00+0000', tz='UTC'): 57.71,
Timestamp('2019-10-10 22:00:00+0000', tz='UTC'): 57.8125,
Timestamp('2019-10-11 04:00:00+0000', tz='UTC'): 58.235,
Timestamp('2019-10-11 10:00:00+0000', tz='UTC'): 58.62,
Timestamp('2019-10-11 16:00:00+0000', tz='UTC'): 59.1825,
Timestamp('2019-10-11 22:00:00+0000', tz='UTC'): 59.3125,
Timestamp('2019-10-14 04:00:00+0000', tz='UTC'): 58.5925,
Timestamp('2019-10-14 10:00:00+0000', tz='UTC'): 59.25,
Timestamp('2019-10-14 16:00:00+0000', tz='UTC'): 58.975,
Timestamp('2019-10-14 22:00:00+0000', tz='UTC'): 59.1125,
Timestamp('2019-10-15 04:00:00+0000', tz='UTC'): 59.2525,
Timestamp('2019-10-15 10:00:00+0000', tz='UTC'): 58.9238,
Timestamp('2019-10-15 16:00:00+0000', tz='UTC'): 58.9,
Timestamp('2019-10-15 22:00:00+0000', tz='UTC'): 58.75,
Timestamp('2019-10-16 04:00:00+0000', tz='UTC'): 58.565,
Timestamp('2019-10-16 10:00:00+0000', tz='UTC'): 58.59,
Timestamp('2019-10-16 16:00:00+0000', tz='UTC'): 58.6825,
Timestamp('2019-10-16 22:00:00+0000', tz='UTC'): 58.5875,
Timestamp('2019-10-17 04:00:00+0000', tz='UTC'): 58.9375,
Timestamp('2019-10-17 10:00:00+0000', tz='UTC'): 58.48,
Timestamp('2019-10-17 16:00:00+0000', tz='UTC'): 58.8375,
Timestamp('2019-10-17 22:00:00+0000', tz='UTC'): 58.8025,
Timestamp('2019-10-18 04:00:00+0000', tz='UTC'): 58.7275,
Timestamp('2019-10-18 10:00:00+0000', tz='UTC'): 58.7838,
Timestamp('2019-10-18 16:00:00+0000', tz='UTC'): 59.0675,
Timestamp('2019-10-18 22:00:00+0000', tz='UTC'): 59.0525,
Timestamp('2019-10-21 04:00:00+0000', tz='UTC'): 59.3775,
Timestamp('2019-10-21 10:00:00+0000', tz='UTC'): 60.1825,
Timestamp('2019-10-21 16:00:00+0000', tz='UTC'): 60.165,
Timestamp('2019-10-21 22:00:00+0000', tz='UTC'): 60.1725,
Timestamp('2019-10-22 04:00:00+0000', tz='UTC'): 60.1975,
Timestamp('2019-10-22 10:00:00+0000', tz='UTC'): 60.2975,
Timestamp('2019-10-22 16:00:00+0000', tz='UTC'): 59.8025,
Timestamp('2019-10-22 22:00:00+0000', tz='UTC'): 59.755,
Timestamp('2019-10-23 04:00:00+0000', tz='UTC'): 60.3975,
Timestamp('2019-10-23 10:00:00+0000', tz='UTC'): 60.6265,
Timestamp('2019-10-23 16:00:00+0000', tz='UTC'): 60.8875,
Timestamp('2019-10-23 22:00:00+0000', tz='UTC'): 61.0275,
Timestamp('2019-10-24 04:00:00+0000', tz='UTC'): 61.0525,
Timestamp('2019-10-24 10:00:00+0000', tz='UTC'): 60.82,
Timestamp('2019-10-24 16:00:00+0000', tz='UTC'): 60.8125,
Timestamp('2019-10-24 22:00:00+0000', tz='UTC'): 60.8225,
Timestamp('2019-10-25 04:00:00+0000', tz='UTC'): 60.75,
Timestamp('2019-10-25 10:00:00+0000', tz='UTC'): 61.3425,
Timestamp('2019-10-25 16:00:00+0000', tz='UTC'): 61.7,
Timestamp('2019-10-25 22:00:00+0000', tz='UTC'): 61.6875,
Timestamp('2019-10-28 04:00:00+0000', tz='UTC'): 61.8575,
Timestamp('2019-10-28 10:00:00+0000', tz='UTC'): 62.1388,
Timestamp('2019-10-28 16:00:00+0000', tz='UTC'): 62.285,
Timestamp('2019-10-28 22:00:00+0000', tz='UTC'): 62.2875,
Timestamp('2019-10-29 04:00:00+0000', tz='UTC'): 62.15,
Timestamp('2019-10-29 10:00:00+0000', tz='UTC'): 60.7952,
Timestamp('2019-10-29 16:00:00+0000', tz='UTC'): 60.9525,
Timestamp('2019-10-29 22:00:00+0000', tz='UTC'): 60.9575,
Timestamp('2019-10-30 04:00:00+0000', tz='UTC'): 60.9575,
Timestamp('2019-10-30 10:00:00+0000', tz='UTC'): 60.5125,
Timestamp('2019-10-30 16:00:00+0000', tz='UTC'): 62.05,
Timestamp('2019-10-30 22:00:00+0000', tz='UTC'): 62.0475,
Timestamp('2019-10-31 04:00:00+0000', tz='UTC'): 61.76,
Timestamp('2019-10-31 10:00:00+0000', tz='UTC'): 62.0523,
Timestamp('2019-10-31 16:00:00+0000', tz='UTC'): 62.105,
Timestamp('2019-10-31 22:00:00+0000', tz='UTC'): 62.14,
Timestamp('2019-11-01 04:00:00+0000', tz='UTC'): 62.35,
Timestamp('2019-11-01 10:00:00+0000', tz='UTC'): 63.3099,
Timestamp('2019-11-01 16:00:00+0000', tz='UTC'): 63.9725,
Timestamp('2019-11-01 22:00:00+0000', tz='UTC'): 64.025,
Timestamp('2019-11-04 10:00:00+0000', tz='UTC'): 64.2388,
Timestamp('2019-11-04 16:00:00+0000', tz='UTC'): 64.375,
Timestamp('2019-11-04 22:00:00+0000', tz='UTC'): 64.4975,
Timestamp('2019-11-05 04:00:00+0000', tz='UTC'): 64.575}}
this is the dataset
and the expected new column is: 62.35
63.3099, 63.9725, 64.025 and so on
I want the values of 1 month ahead
but using df['new_column'] = df.shift(1, freq = 'M')['c']
doesn't do the job
Upvotes: 3
Views: 2674
Reputation: 820
Assuming you have unique timestamps per day and don't have any missing timestamp value. Following might work, as you just have to shift rows based on days and no of unique time stamps per day.
import pandas as pd
# Dummy data
# I assumed you have 4 unique values for a day's timestamp and don't have any missing values
lst1 = list(pd.date_range('2020-08-01 04:00:00', periods=60))
lst2 = list(pd.date_range('2020-08-01 10:00:00', periods=60))
lst3 = list(pd.date_range('2020-08-01 16:00:00', periods=60))
lst4 = list(pd.date_range('2020-08-01 22:00:00', periods=60))
lst1.extend(lst2)
lst1.extend(lst3)
lst1.extend(lst4)
data = {
'date': lst1,
'value': [v for v in range(0,240)]
}
# Preprocessing
df = pd.DataFrame(data)
df = df.sort_values(by=['date'])
df.reset_index(drop=True, inplace=True)
def update(row,df):
row['value'] = df.loc[row.name]['value']
return row
# factor is = X days of shift * Y unique time stamps per day
factor = 31 * 4
df.apply(update,axis=1,args=[df.shift(-factor)])
Upvotes: 1
Reputation: 16683
The problem is pretty straightforward but you need to some specific things with your dates in order to get n
n
and use pd.DateOffset(months=1)
-
by n
rowsdf['Date'] = pd.to_datetime(pd.to_datetime(df['Date']).dt.date)
n = df[df['Date'].isin(pd.to_datetime(df['Date'] + pd.DateOffset(months=1)))].index[0]
df['new_column'] = df['c'].shift(-n)
df
Out[1]:
Date c new_column
0 2019-10-01 56.5675 62.35
1 2019-10-01 56.2725 63.3099
2 2019-10-01 56.2925 63.9725
3 2019-10-02 55.6525 64.025
4 2019-10-02 54.8025 64.2388
.. ... ... ...
94 2019-11-01 64.025 nan
95 2019-11-04 64.2388 nan
96 2019-11-04 64.375 nan
97 2019-11-04 64.4975 nan
98 2019-11-05 64.575 nan
Please note that to get the above output, I used:
df = pd.DataFrame(
{pd.Timestamp('2019-10-01 10:00:00+0000', tz='UTC'): 56.5675,
pd.Timestamp('2019-10-01 16:00:00+0000', tz='UTC'): 56.2725,
pd.Timestamp('2019-10-01 22:00:00+0000', tz='UTC'): 56.2925,
pd.Timestamp('2019-10-02 04:00:00+0000', tz='UTC'): 55.6525,
pd.Timestamp('2019-10-02 10:00:00+0000', tz='UTC'): 54.8025,
pd.Timestamp('2019-10-02 16:00:00+0000', tz='UTC'): 54.625,
pd.Timestamp('2019-10-02 22:00:00+0000', tz='UTC'): 54.625,
pd.Timestamp('2019-10-03 04:00:00+0000', tz='UTC'): 54.825,
pd.Timestamp('2019-10-03 10:00:00+0000', tz='UTC'): 54.7075,
pd.Timestamp('2019-10-03 16:00:00+0000', tz='UTC'): 55.1575,
pd.Timestamp('2019-10-03 22:00:00+0000', tz='UTC'): 55.125,
pd.Timestamp('2019-10-04 04:00:00+0000', tz='UTC'): 55.88,
pd.Timestamp('2019-10-04 10:00:00+0000', tz='UTC'): 56.51,
pd.Timestamp('2019-10-04 16:00:00+0000', tz='UTC'): 56.77,
pd.Timestamp('2019-10-04 22:00:00+0000', tz='UTC'): 56.7375,
pd.Timestamp('2019-10-07 04:00:00+0000', tz='UTC'): 56.5,
pd.Timestamp('2019-10-07 10:00:00+0000', tz='UTC'): 57.3525,
pd.Timestamp('2019-10-07 16:00:00+0000', tz='UTC'): 56.7875,
pd.Timestamp('2019-10-07 22:00:00+0000', tz='UTC'): 56.86,
pd.Timestamp('2019-10-08 04:00:00+0000', tz='UTC'): 56.75,
pd.Timestamp('2019-10-08 10:00:00+0000', tz='UTC'): 56.525,
pd.Timestamp('2019-10-08 16:00:00+0000', tz='UTC'): 55.9775,
pd.Timestamp('2019-10-08 22:00:00+0000', tz='UTC'): 55.925,
pd.Timestamp('2019-10-09 04:00:00+0000', tz='UTC'): 56.75,
pd.Timestamp('2019-10-09 10:00:00+0000', tz='UTC'): 56.6783,
pd.Timestamp('2019-10-09 16:00:00+0000', tz='UTC'): 56.77,
pd.Timestamp('2019-10-09 22:00:00+0000', tz='UTC'): 56.075,
pd.Timestamp('2019-10-10 04:00:00+0000', tz='UTC'): 56.875,
pd.Timestamp('2019-10-10 10:00:00+0000', tz='UTC'): 57.5175,
pd.Timestamp('2019-10-10 16:00:00+0000', tz='UTC'): 57.71,
pd.Timestamp('2019-10-10 22:00:00+0000', tz='UTC'): 57.8125,
pd.Timestamp('2019-10-11 04:00:00+0000', tz='UTC'): 58.235,
pd.Timestamp('2019-10-11 10:00:00+0000', tz='UTC'): 58.62,
pd.Timestamp('2019-10-11 16:00:00+0000', tz='UTC'): 59.1825,
pd.Timestamp('2019-10-11 22:00:00+0000', tz='UTC'): 59.3125,
pd.Timestamp('2019-10-14 04:00:00+0000', tz='UTC'): 58.5925,
pd.Timestamp('2019-10-14 10:00:00+0000', tz='UTC'): 59.25,
pd.Timestamp('2019-10-14 16:00:00+0000', tz='UTC'): 58.975,
pd.Timestamp('2019-10-14 22:00:00+0000', tz='UTC'): 59.1125,
pd.Timestamp('2019-10-15 04:00:00+0000', tz='UTC'): 59.2525,
pd.Timestamp('2019-10-15 10:00:00+0000', tz='UTC'): 58.9238,
pd.Timestamp('2019-10-15 16:00:00+0000', tz='UTC'): 58.9,
pd.Timestamp('2019-10-15 22:00:00+0000', tz='UTC'): 58.75,
pd.Timestamp('2019-10-16 04:00:00+0000', tz='UTC'): 58.565,
pd.Timestamp('2019-10-16 10:00:00+0000', tz='UTC'): 58.59,
pd.Timestamp('2019-10-16 16:00:00+0000', tz='UTC'): 58.6825,
pd.Timestamp('2019-10-16 22:00:00+0000', tz='UTC'): 58.5875,
pd.Timestamp('2019-10-17 04:00:00+0000', tz='UTC'): 58.9375,
pd.Timestamp('2019-10-17 10:00:00+0000', tz='UTC'): 58.48,
pd.Timestamp('2019-10-17 16:00:00+0000', tz='UTC'): 58.8375,
pd.Timestamp('2019-10-17 22:00:00+0000', tz='UTC'): 58.8025,
pd.Timestamp('2019-10-18 04:00:00+0000', tz='UTC'): 58.7275,
pd.Timestamp('2019-10-18 10:00:00+0000', tz='UTC'): 58.7838,
pd.Timestamp('2019-10-18 16:00:00+0000', tz='UTC'): 59.0675,
pd.Timestamp('2019-10-18 22:00:00+0000', tz='UTC'): 59.0525,
pd.Timestamp('2019-10-21 04:00:00+0000', tz='UTC'): 59.3775,
pd.Timestamp('2019-10-21 10:00:00+0000', tz='UTC'): 60.1825,
pd.Timestamp('2019-10-21 16:00:00+0000', tz='UTC'): 60.165,
pd.Timestamp('2019-10-21 22:00:00+0000', tz='UTC'): 60.1725,
pd.Timestamp('2019-10-22 04:00:00+0000', tz='UTC'): 60.1975,
pd.Timestamp('2019-10-22 10:00:00+0000', tz='UTC'): 60.2975,
pd.Timestamp('2019-10-22 16:00:00+0000', tz='UTC'): 59.8025,
pd.Timestamp('2019-10-22 22:00:00+0000', tz='UTC'): 59.755,
pd.Timestamp('2019-10-23 04:00:00+0000', tz='UTC'): 60.3975,
pd.Timestamp('2019-10-23 10:00:00+0000', tz='UTC'): 60.6265,
pd.Timestamp('2019-10-23 16:00:00+0000', tz='UTC'): 60.8875,
pd.Timestamp('2019-10-23 22:00:00+0000', tz='UTC'): 61.0275,
pd.Timestamp('2019-10-24 04:00:00+0000', tz='UTC'): 61.0525,
pd.Timestamp('2019-10-24 10:00:00+0000', tz='UTC'): 60.82,
pd.Timestamp('2019-10-24 16:00:00+0000', tz='UTC'): 60.8125,
pd.Timestamp('2019-10-24 22:00:00+0000', tz='UTC'): 60.8225,
pd.Timestamp('2019-10-25 04:00:00+0000', tz='UTC'): 60.75,
pd.Timestamp('2019-10-25 10:00:00+0000', tz='UTC'): 61.3425,
pd.Timestamp('2019-10-25 16:00:00+0000', tz='UTC'): 61.7,
pd.Timestamp('2019-10-25 22:00:00+0000', tz='UTC'): 61.6875,
pd.Timestamp('2019-10-28 04:00:00+0000', tz='UTC'): 61.8575,
pd.Timestamp('2019-10-28 10:00:00+0000', tz='UTC'): 62.1388,
pd.Timestamp('2019-10-28 16:00:00+0000', tz='UTC'): 62.285,
pd.Timestamp('2019-10-28 22:00:00+0000', tz='UTC'): 62.2875,
pd.Timestamp('2019-10-29 04:00:00+0000', tz='UTC'): 62.15,
pd.Timestamp('2019-10-29 10:00:00+0000', tz='UTC'): 60.7952,
pd.Timestamp('2019-10-29 16:00:00+0000', tz='UTC'): 60.9525,
pd.Timestamp('2019-10-29 22:00:00+0000', tz='UTC'): 60.9575,
pd.Timestamp('2019-10-30 04:00:00+0000', tz='UTC'): 60.9575,
pd.Timestamp('2019-10-30 10:00:00+0000', tz='UTC'): 60.5125,
pd.Timestamp('2019-10-30 16:00:00+0000', tz='UTC'): 62.05,
pd.Timestamp('2019-10-30 22:00:00+0000', tz='UTC'): 62.0475,
pd.Timestamp('2019-10-31 04:00:00+0000', tz='UTC'): 61.76,
pd.Timestamp('2019-10-31 10:00:00+0000', tz='UTC'): 62.0523,
pd.Timestamp('2019-10-31 16:00:00+0000', tz='UTC'): 62.105,
pd.Timestamp('2019-10-31 22:00:00+0000', tz='UTC'): 62.14,
pd.Timestamp('2019-11-01 04:00:00+0000', tz='UTC'): 62.35,
pd.Timestamp('2019-11-01 10:00:00+0000', tz='UTC'): 63.3099,
pd.Timestamp('2019-11-01 16:00:00+0000', tz='UTC'): 63.9725,
pd.Timestamp('2019-11-01 22:00:00+0000', tz='UTC'): 64.025,
pd.Timestamp('2019-11-04 10:00:00+0000', tz='UTC'): 64.2388,
pd.Timestamp('2019-11-04 16:00:00+0000', tz='UTC'): 64.375,
pd.Timestamp('2019-11-04 22:00:00+0000', tz='UTC'): 64.4975,
pd.Timestamp('2019-11-05 04:00:00+0000', tz='UTC'): 64.575}, index=['c']).T
df = df.reset_index().rename({'index': 'Date'}, axis=1)
# and then my answer:
df['Date'] = pd.to_datetime(pd.to_datetime(df['Date']).dt.date)
n = df[df['Date'].isin(pd.to_datetime(df['Date'] +
pd.DateOffset(months=1)))].index[0]
df['new_column'] = df['c'].shift(-n)
df
Upvotes: 1