Reputation: 213
name date value
a 1/1/2011 3
b 1/1/2011 5
c 1/1/2011 7
a 1/2/2011 6
b 1/2/2011 10
c 1/2/2011 14
I have a df here where the value is cumulative stats. So the Actual value of name: a
date: 1/2/2011
is 3 not 6. To get the actual value of a particular day, I need to take that day's value minus the previous day's value. I want to calculate the actual value of each name for each date. Something along the lines of df.groupby(['name', 'date'])['value'].diff()
but this code is returning error.
In the end what I need is
name date actual value
a 1/1/2011 3
b 1/1/2011 5
c 1/1/2011 7
a 1/2/2011 3
b 1/2/2011 5
c 1/2/2011 7
Upvotes: 1
Views: 88
Reputation: 13175
This can be done in a single line and in a vectorized way.
import pandas as pd
df = pd.read_clipboard() # Reading from your question
df['value'] = df.groupby('name')['value'].diff(1).fillna(df['value'])
As was discussed in the comments, it is necessary to reference the original 'values
Series when applying fillna
to correctly replace the NaN
values from diff
(this occurs for the first instance of each label in 'name'
).
Upvotes: 3
Reputation: 1722
df['value'] = df['value'].fillna(method='ffill')
df = df.sort_values(by=['name', 'date'])
df['actual'] = df.groupby(['name'])['value'].transform(lambda x: x.diff())
Upvotes: 2