Reputation: 271
I have dataframe df as shown below:
DateTime Value
2011-01-01 01:00:00 5
2011-01-01 01:30:00 5.5
2011-01-01 02:00:00 6.7
2011-01-01 02:30:00 6.9
.
.
2011-01-30 23:45:00 86.5
I want to reset the Value
to 0.0
every day at 8:45 am. And also I would like to add a new column Difference
which is the difference between below row and above a row of value column starting from 2nd row. For example 5.5 - 5 = 0.5
Hence my output should look like:
DateTime Value Difference
2011-01-01 01:00:00 5 0
2011-01-01 01:30:00 5.5 0.5
2011-01-01 02:00:00 6.7 1.2
2011-01-01 02:30:00 6.9 0.2
.
.
2011-01-01 08:25:00 10.5 5.0
2011-01-01 08:30:00 12.5 2.0
2011-01-01 08:45:00 0.00 0.0
2011-01-01 09:00:00 9.0 9.0
.
2011-01-30 23:45:00 86.5 2.5
How can I possibly do this?
Upvotes: 2
Views: 1165
Reputation: 5215
First, create a column that will serve to indicate where each day starts (08:45):
# assuming your DataFrame is named "df"
# also assuming df['Datetime'] is not yet pandas.datetime objects
df['myDate'] = (df['DateTime'].apply(lambda x: pd.datetime.strftime(x, "%H:%M")) == "08:45").cumsum()
This marks every row as True
where the time is 08:45
and the rest as false; when we take a cumulative sum of these numbers, the first True
each day makes the value for the rest of the day i, the next day i+1, etc. Now, to get Difference
, as you've described it, we only need to do this:
df['Difference'] = df.groupby('myDate')['Value'].diff().fillna(0)
You could either drop myDate
once you have Difference
(df.drop('myDate', axis=1, inplace=True)
), or if your preference is to make this a one-liner, you could skip the temporary column assignment completely:
# perhaps too long for one line :)
df['Difference'] = df.groupby((df['DateTime'].apply(lambda x: pd.datetime.strftime(x, "%H:%M")) == "08:45").cumsum())['Value'].diff().fillna(0)
Output (assuming you keep the temporary column):
DateTime Value myDate Difference
1 2011-01-01 01:00:00 5.0 0 0.0
2 2011-01-01 01:30:00 5.5 0 0.5
3 2011-01-01 02:00:00 6.7 0 1.2
4 2011-01-01 02:30:00 6.9 0 0.2
5 2011-01-01 08:25:00 10.5 0 3.6
6 2011-01-01 08:30:00 12.5 0 2.0
7 2011-01-01 08:45:00 0.0 1 0.0
8 2011-01-01 09:00:00 9.0 1 9.0
Upvotes: 3
Reputation: 2788
A solution could be :
>>> df
0
0 5.5
1 6.7
2 3.4
3 8.9
>>> df[1]=df[0]
>>> df[1][0]=0.0
>>> df[1][1:]=[df[0][i]-df[0][i-1] for i in range(1,len(df[0]))]
>>> df
0 1
0 5.5 0.0
1 6.7 1.2
2 3.4 -3.3
3 8.9 5.5
But the solution by @cmaher much better !
Upvotes: 1