IDontKnowAnything
IDontKnowAnything

Reputation: 213

How to calculate difference on grouped df?

name      date      value
 a      1/1/2011      3
 b      1/1/2011      5
 c      1/1/2011      7
 a      1/2/2011      6
 b      1/2/2011      10
 c      1/2/2011      14

I have a df here where the value is cumulative stats. So the Actual value of name: a date: 1/2/2011 is 3 not 6. To get the actual value of a particular day, I need to take that day's value minus the previous day's value. I want to calculate the actual value of each name for each date. Something along the lines of df.groupby(['name', 'date'])['value'].diff() but this code is returning error.

In the end what I need is

name      date   actual value
 a      1/1/2011      3
 b      1/1/2011      5
 c      1/1/2011      7
 a      1/2/2011      3
 b      1/2/2011      5
 c      1/2/2011      7

Upvotes: 1

Views: 88

Answers (2)

roganjosh
roganjosh

Reputation: 13175

This can be done in a single line and in a vectorized way.

import pandas as pd

df = pd.read_clipboard() # Reading from your question

df['value'] = df.groupby('name')['value'].diff(1).fillna(df['value'])

As was discussed in the comments, it is necessary to reference the original 'values Series when applying fillna to correctly replace the NaN values from diff (this occurs for the first instance of each label in 'name').

Upvotes: 3

Hubert Dudek
Hubert Dudek

Reputation: 1722

df['value'] = df['value'].fillna(method='ffill')
df = df.sort_values(by=['name', 'date'])
df['actual'] = df.groupby(['name'])['value'].transform(lambda x: x.diff())

Upvotes: 2

Related Questions