(python) Using diff() function in a DataFrame

Question

How can I use the func diff() resetting the result to zero if the date in the current row is different from the date in the previous?

For instance, I have the df below containing ts and value, when generating value_diff I can use:

df['value_diff'] = df.value.diff()

but in this case the row of index 4 will have value_diff = 200 and I need it to reset to zero because date has changed.

i  ts                       value  value_diff
0  2019-01-02 11:48:01.001  100    0
1  2019-01-02 14:26:01.001  150    50
2  2019-01-02 16:12:01.001  75    -75
3  2019-01-02 18:54:01.001  50    -25
4  2019-01-03 09:12:01.001  250   0
5  2019-01-03 12:25:01.001  310   60
6  2019-01-03 16:50:01.001  45    -265
7  2019-01-03 17:10:01.001  30    -15

I know I can build a loop for it, but I was wondering if it can be solved in a more fancy way, maybe using lambda functions.

roganjosh · Accepted Answer

You want to use groupby and then fillna to get the 0 values.

import pandas as pd

# Reading your example and getting back to correct format from clipboard
df = pd.read_clipboard()
df['ts'] = df['i'] + ' ' + df['ts']
df.drop(['i', 'value_diff'], axis=1, inplace=True) # The columns get misaligned from reading clipboard

# Now we have your original
print(df.head())

# Convert ts to datetime
df['ts'] = pd.to_datetime(df['ts'], infer_datetime_format=True)

# Add a date column for us to groupby
df['date'] = df['ts'].dt.date

# Apply diff and fillna
df['value_diff'] = df.groupby('date')['value'].diff().fillna(0)

(python) Using diff() function in a DataFrame

Answers (1)

Related Questions