Different diff operations on different columns

Question

I want to do different diff() manipulation on different columns in a pandas dataframe. Below is an example of using if-statement in a lambda function to take diff(1) on col1 and diff(2) on col2.

data = pd.DataFrame({'col1':[32,42,54,62,76,76,87,98,122,111,132,134,134,156],
                    'col2':[32,58,59,63,65,72,95,100,102,101,232,234,234,256]})

data.apply(lambda x: x.diff(1) if x.name=='col1' else x.diff(2))

I was first thinking about a solution with a dictionary, similar to the agg function. That would be easier when there is more than two columns. Does anyone have some handy methods on how to make different diff() operations on different columns?

jezrael · Accepted Answer

If all operation return Series with same size like original column like diff or cumsum is possible use DataFrame.agg:

df = data.agg({'col1':lambda x: x.diff(), 'col2':lambda x: x.diff(2)})
print (df)
    col1   col2
0    NaN    NaN
1   10.0    NaN
2   12.0   27.0
3    8.0    5.0
4   14.0    6.0
5    0.0    9.0
6   11.0   30.0
7   11.0   28.0
8   24.0    7.0
9  -11.0    1.0
10  21.0  130.0
11   2.0  133.0
12   0.0    2.0
13  22.0   22.0

df = data.agg({'col1':lambda x: x.diff(), 'col2':'mean'})
print (df)

ValueError: cannot perform both aggregation and transformation operations simultaneously

Different diff operations on different columns

Answers (2)

Related Questions