Reputation: 3528
I have a df
as follows:
dates values
2020-03-29 00:30:00 86.824
2020-03-29 00:45:00 86.923
2020-03-29 01:00:00 87.222
2020-03-29 01:15:00 87.52
2020-03-29 01:30:00 87.918
2020-03-29 01:45:00 88.415
2020-03-29 02:00:00 89.012
2020-03-29 02:15:00 89.807
2020-03-29 02:30:00 90.504
2020-03-29 02:45:00 90.702
2020-03-29 03:00:00 90.205
2020-03-29 03:15:00 88.713
2020-03-29 03:30:00 86.724
2020-03-29 03:45:00 84.636
2020-03-29 04:00:00 83.044
2020-03-29 04:15:00 82.249
I am considering Daylight saving and for that, I have to delete just the dates in the dates
column and the values in the values
column should not be deleted.
Since the daylight saving every year is on the last Sunday of march, I did the following to obtain the date of the last Sunday in march:
import datetime
currentYear = datetime.datetime.now().strftime("%Y")
march_last_day = currentYear + '-03-31'
march_last_day = datetime.datetime.strptime(march_last_day, '%Y-%m-%d')
ZEIT_UMSTELLUNG_START = march_last_day - \
datetime.timedelta((march_last_day.weekday() + 1) % 7)
ZEIT_UMSTELLUNG_START = ZEIT_UMSTELLUNG_START.replace(hour=2, minute=00)
ZEIT_UMSTELLUNG_START_2 = ZEIT_UMSTELLUNG_START + datetime.timedelta(hours=1)
values:
ZEIT_UMSTELLUNG_START.replace(hour=2, minute=00)
Out[13]: datetime.datetime(2020, 3, 29, 2, 0)
ZEIT_UMSTELLUNG_START + datetime.timedelta(hours=1)
Out[14]: datetime.datetime(2020, 3, 29, 3, 0)
Now I want to delete the dates
that lie between ZEIT_UMSTELLUNG_START
(included) and ZEIT_UMSTELLUNG_START_2
(excluded) from the dates
column and the next values should be 'shifted' up, but the values in the values
column shouldn't be disturbed.
I did:
mask = (df['dates'] >= ZEIT_UMSTELLUNG_START) & (df['dates'] < ZEIT_UMSTELLUNG_START_2)
df = df.loc[mask]
But it deletes both the columns.
Desired output
dates values
2020-03-29 00:30:00 86.824
2020-03-29 00:45:00 86.923
2020-03-29 01:00:00 87.222
2020-03-29 01:15:00 87.52
2020-03-29 01:30:00 87.918
2020-03-29 01:45:00 88.415
2020-03-29 03:00:00 89.012
2020-03-29 03:15:00 89.807
2020-03-29 03:30:00 90.504
2020-03-29 03:45:00 90.702
2020-03-29 04:00:00 90.205
2020-03-29 04:15:00 88.713
NaT 86.724
NaT 84.636
NaT 83.044
NaT 82.249
If you compare my desired output with the original df
, you can see that the timestamps between 02:00:00
and 03:00:00
have been deleted and the other columns have been just pushed or shifted up. After shifting, NaN
in the last 4 rows of the dates
column can be placed
Upvotes: 1
Views: 38
Reputation: 10960
Try
df.loc[mask, 'dates'] = pd.NaT
df['dates'] = df['dates'].sort_values(ascending=True).tolist()
Upvotes: 1