nish
nish

Reputation: 271

Resetting a value in a dataframe column at a particular timestep and also subtracting rows

I have dataframe df as shown below:

   DateTime                  Value
   2011-01-01 01:00:00        5
   2011-01-01 01:30:00        5.5
   2011-01-01 02:00:00        6.7
   2011-01-01 02:30:00        6.9
   .
   .
   2011-01-30 23:45:00        86.5

I want to reset the Value to 0.0 every day at 8:45 am. And also I would like to add a new column Difference which is the difference between below row and above a row of value column starting from 2nd row. For example 5.5 - 5 = 0.5

Hence my output should look like:

   DateTime                  Value    Difference
   2011-01-01 01:00:00        5          0
   2011-01-01 01:30:00        5.5        0.5
   2011-01-01 02:00:00        6.7        1.2
   2011-01-01 02:30:00        6.9        0.2
   .
   .
   2011-01-01 08:25:00        10.5       5.0
   2011-01-01 08:30:00        12.5       2.0
   2011-01-01 08:45:00        0.00       0.0
   2011-01-01 09:00:00        9.0        9.0
   .
   2011-01-30 23:45:00        86.5       2.5

How can I possibly do this?

Upvotes: 2

Views: 1165

Answers (2)

cmaher
cmaher

Reputation: 5215

First, create a column that will serve to indicate where each day starts (08:45):

# assuming your DataFrame is named "df"
# also assuming df['Datetime'] is not yet pandas.datetime objects
df['myDate'] = (df['DateTime'].apply(lambda x: pd.datetime.strftime(x, "%H:%M")) == "08:45").cumsum()

This marks every row as True where the time is 08:45 and the rest as false; when we take a cumulative sum of these numbers, the first True each day makes the value for the rest of the day i, the next day i+1, etc. Now, to get Difference, as you've described it, we only need to do this:

df['Difference'] = df.groupby('myDate')['Value'].diff().fillna(0)

You could either drop myDate once you have Difference (df.drop('myDate', axis=1, inplace=True)), or if your preference is to make this a one-liner, you could skip the temporary column assignment completely:

# perhaps too long for one line :)
df['Difference'] = df.groupby((df['DateTime'].apply(lambda x: pd.datetime.strftime(x, "%H:%M")) == "08:45").cumsum())['Value'].diff().fillna(0)

Output (assuming you keep the temporary column):

              DateTime  Value  myDate  Difference
1  2011-01-01 01:00:00    5.0       0         0.0
2  2011-01-01 01:30:00    5.5       0         0.5
3  2011-01-01 02:00:00    6.7       0         1.2
4  2011-01-01 02:30:00    6.9       0         0.2
5  2011-01-01 08:25:00   10.5       0         3.6
6  2011-01-01 08:30:00   12.5       0         2.0
7  2011-01-01 08:45:00    0.0       1         0.0
8  2011-01-01 09:00:00    9.0       1         9.0

Upvotes: 3

Dadep
Dadep

Reputation: 2788

A solution could be :

>>> df
     0
0  5.5
1  6.7
2  3.4
3  8.9
>>> df[1]=df[0]
>>> df[1][0]=0.0
>>> df[1][1:]=[df[0][i]-df[0][i-1] for i in range(1,len(df[0]))]
>>> df
     0    1
0  5.5  0.0
1  6.7  1.2
2  3.4 -3.3
3  8.9  5.5

But the solution by @cmaher much better !

Upvotes: 1

Related Questions