Reputation: 723
I have a timeseries dataframe that has three columns... date, time and value and it looks like this:
**date** **time** **value**
11.03.2020 1103 5
11.03.2020 0000 10
11.03.2020 0100 6
12.03.2020 0201 8
12.03.2020 0305 7
12.03.2020 0400 4
basically the time column is incrementing by 60 (+-5) mins for every row. I want to correct my date column values in such a way that whenever the time is 0000 (+-5) the day part of the date column increments by 1 untill the next 0000 (+-5) time value is encountered and than it increments by 1 again untill the next such time value is encountered or the end of the data frame is reached.
The result should look like this:
**date** **time** **value**
11.03.2020 1103 5
12.03.2020 0000 10
12.03.2020 0100 6
12.03.2020 0201 8
12.03.2020 0305 7
12.03.2020 0400 4
I would appreciate some help. Thanks
Upvotes: 4
Views: 110
Reputation: 71687
Parse the strings in column date
as datetime
df['date'] = pd.to_datetime(df['date'], dayfirst=True)
Create a boolean mask m
by comparing the time
column with 0000
, using boolean indexing add the DateOffset
of 1 days
to the values in date column where the boolean mask holds true, then mask
and forward fill
the values in updated date column where the current date is less that previous date
m = df['time'].eq('0000')
df.loc[m, 'date'] += pd.DateOffset(days=1)
df['date'] = df['date'].mask(df['date'].diff().dt.days.lt(0)).ffill()
date time value
0 2020-03-11 1103 5
1 2020-03-12 0000 10
2 2020-03-12 0100 6
3 2020-03-12 0201 8
4 2020-03-12 0305 7
5 2020-03-12 0400 4
Upvotes: 4