Reputation: 159
How can I use the func diff() resetting the result to zero if the date in the current row is different from the date in the previous?
For instance, I have the df
below containing ts
and value
, when generating value_diff
I can use:
df['value_diff'] = df.value.diff()
but in this case the row of index 4 will have value_diff = 200
and I need it to reset to zero because date has changed.
i ts value value_diff
0 2019-01-02 11:48:01.001 100 0
1 2019-01-02 14:26:01.001 150 50
2 2019-01-02 16:12:01.001 75 -75
3 2019-01-02 18:54:01.001 50 -25
4 2019-01-03 09:12:01.001 250 0
5 2019-01-03 12:25:01.001 310 60
6 2019-01-03 16:50:01.001 45 -265
7 2019-01-03 17:10:01.001 30 -15
I know I can build a loop for it, but I was wondering if it can be solved in a more fancy way, maybe using lambda functions.
Upvotes: 1
Views: 1464
Reputation: 13175
You want to use groupby
and then fillna
to get the 0 values.
import pandas as pd
# Reading your example and getting back to correct format from clipboard
df = pd.read_clipboard()
df['ts'] = df['i'] + ' ' + df['ts']
df.drop(['i', 'value_diff'], axis=1, inplace=True) # The columns get misaligned from reading clipboard
# Now we have your original
print(df.head())
# Convert ts to datetime
df['ts'] = pd.to_datetime(df['ts'], infer_datetime_format=True)
# Add a date column for us to groupby
df['date'] = df['ts'].dt.date
# Apply diff and fillna
df['value_diff'] = df.groupby('date')['value'].diff().fillna(0)
Upvotes: 2