Reputation: 344
Suppose I have a DataFrame of students' grades and want to track their grades with time. The DataFrame might look like this:
data = [ { "Name": "John", "Period": 1, "Grade": 60 }, { "Name": "John", "Period": 2, "Grade": 80 }, { "Name": "John", "Period": 3, "Grade": 90 }, { "Name": "Bill", "Period": 1, "Grade": 80 }, { "Name": "Bill", "Period": 2, "Grade": 70 }, { "Name": "Bill", "Period": 3, "Grade": 80 }, { "Name": "Tom", "Period": 1, "Grade": 50 }, { "Name": "Tom", "Period": 2, "Grade": 75 }, { "Name": "Tom", "Period": 3, "Grade": 50 } ]
df = pd.DataFrame(data)
df.set_index(["Name", "Period"], inplace=True)
Grade
Name Period
John 1 60
2 80
3 90
Bill 1 80
2 70
3 80
Tom 1 50
2 75
3 50
Now I would like to add a "change" column that shows the percentage change from exam to exam. These are a bit like stacked DataFrames. If it were one, I would try something like
df["change"] = (df["Grade"] - df["Grade"].shift(1))/df["Grade"].shift(1)
This would correctly return a NaN value in the first row, since it has no previous value. Doing so on the above DataFrame yields:
Grade change
Name Period
John 1 60 NaN
2 80 0.333333
3 90 0.125000
Bill 1 80 -0.111111
2 70 -0.125000
3 80 0.142857
Tom 1 50 -0.375000
2 75 0.500000
3 50 -0.333333
Where I would like the first row's "change" value for each of the outer index values to be NaN, as below:
Grade change
Name Period
John 1 60 NaN
2 80 0.333333
3 90 0.125000
Bill 1 80 NaN
2 70 -0.125000
3 80 0.142857
Tom 1 50 NaN
2 75 0.500000
3 50 -0.333333
This is also so later when aggregating the "change" column, there are no wild shifts since one student's final grade affects the next student's first. I am aware that there are shortcuts where I can simply do the above transformation then change each of the first "change" values to np.nan, but feel there must be a more elegant method.
Upvotes: 1
Views: 82
Reputation: 862591
Use GroupBy.pct_change
by first level of MultiIndex
:
df["change"] = df.groupby(level=0)['Grade'].pct_change()
print (df)
Grade change
Name Period
John 1 60 NaN
2 80 0.333333
3 90 0.125000
Bill 1 80 NaN
2 70 -0.125000
3 80 0.142857
Tom 1 50 NaN
2 75 0.500000
3 50 -0.333333
Solution with DataFrameGroupBy.shift
:
s = df.groupby(level=0)['Grade'].shift()
df["change"] = (df['Grade'] - s) / s
df["change"] = df['Grade'].div(df.groupby(level=0)['Grade'].shift()).sub(1)
And with GroupBy.apply
:
df["change"] = df.groupby(level=0)['Grade'].apply(lambda x: (x - x.shift())/ x.shift())
Better:
df["change"] = df.groupby(level=0)['Grade'].apply(lambda x: (x / x.shift()) - 1)
Upvotes: 2