A1122
A1122

Reputation: 1354

apply function only within the same row index?

I have a dataframe with 2 sorted indexes and I want to apply diff on the column only within col1 in the order sorted by col2.

mini_df = pd.DataFrame({'col1': ['A', 'B', 'C', 'A'], 'col2': [1,2,3,4],  'col3': [1,4,7,3]})
mini_df = mini_df.set_index(['col1', 'col2']).sort_index()
mini_df['diff'] = mini_df.col3.diff(1)

This gives me

              col3    diff
col1    col2  
__________________________
A        1      1      nan
         4      3       2
B        2      4       1
C        3      7       3

Above it applys diff by row. What I want is

              col3    diff
col1    col2  
__________________________
A        1      1      nan
         4      3       2
B        2      4      nan
C        3      7      nan

Upvotes: 2

Views: 51

Answers (2)

busybear
busybear

Reputation: 10590

You'll want to use groupby to apply diff to each group:

mini_df = pd.DataFrame({'col1': ['A', 'B', 'C', 'A'], 'col2': [1,2,3,4],  'col3': [1,4,7,3]})
mini_df = mini_df.set_index(['col1', 'col2']).sort_index()

mini_df['diff'] = mini_df.groupby(axis=0, level='col1')['col3'].diff()

Upvotes: 2

ALollz
ALollz

Reputation: 59549

Since you already go through the heavy lifting of sort, you can diff and only assign within the group. You can't shift non-datetime indices, so either make a Series, or use np.roll, though that wraps around, and would lead to the wrong answer for a single group DataFrame

import pandas as pd

s = pd.Series(mini_df.index.get_level_values('col1'))
mini_df['diff'] = mini_df.col3.diff().where(s.eq(s.shift(1)).values)

           col3  diff
col1 col2            
A    1        1   NaN
     4        3   2.0
B    2        4   NaN
C    3        7   NaN

Upvotes: 1

Related Questions