Reputation: 1354
I have a dataframe with 2 sorted indexes and I want to apply diff
on the column only within col1
in the order sorted by col2
.
mini_df = pd.DataFrame({'col1': ['A', 'B', 'C', 'A'], 'col2': [1,2,3,4], 'col3': [1,4,7,3]})
mini_df = mini_df.set_index(['col1', 'col2']).sort_index()
mini_df['diff'] = mini_df.col3.diff(1)
This gives me
col3 diff
col1 col2
__________________________
A 1 1 nan
4 3 2
B 2 4 1
C 3 7 3
Above it applys diff
by row.
What I want is
col3 diff
col1 col2
__________________________
A 1 1 nan
4 3 2
B 2 4 nan
C 3 7 nan
Upvotes: 2
Views: 51
Reputation: 10590
You'll want to use groupby
to apply diff
to each group:
mini_df = pd.DataFrame({'col1': ['A', 'B', 'C', 'A'], 'col2': [1,2,3,4], 'col3': [1,4,7,3]})
mini_df = mini_df.set_index(['col1', 'col2']).sort_index()
mini_df['diff'] = mini_df.groupby(axis=0, level='col1')['col3'].diff()
Upvotes: 2
Reputation: 59549
Since you already go through the heavy lifting of sort
, you can diff
and only assign within the group. You can't shift
non-datetime indices, so either make a Series
, or use np.roll
, though that wraps around, and would lead to the wrong answer for a single group DataFrame
import pandas as pd
s = pd.Series(mini_df.index.get_level_values('col1'))
mini_df['diff'] = mini_df.col3.diff().where(s.eq(s.shift(1)).values)
col3 diff
col1 col2
A 1 1 NaN
4 3 2.0
B 2 4 NaN
C 3 7 NaN
Upvotes: 1