Henri
Henri

Reputation: 1235

Different diff operations on different columns

I want to do different diff() manipulation on different columns in a pandas dataframe. Below is an example of using if-statement in a lambda function to take diff(1) on col1 and diff(2) on col2.

data = pd.DataFrame({'col1':[32,42,54,62,76,76,87,98,122,111,132,134,134,156],
                    'col2':[32,58,59,63,65,72,95,100,102,101,232,234,234,256]})

data.apply(lambda x: x.diff(1) if x.name=='col1' else x.diff(2))

I was first thinking about a solution with a dictionary, similar to the agg function. That would be easier when there is more than two columns. Does anyone have some handy methods on how to make different diff() operations on different columns?

Upvotes: 0

Views: 61

Answers (2)

jezrael
jezrael

Reputation: 862751

If all operation return Series with same size like original column like diff or cumsum is possible use DataFrame.agg:

df = data.agg({'col1':lambda x: x.diff(), 'col2':lambda x: x.diff(2)})
print (df)
    col1   col2
0    NaN    NaN
1   10.0    NaN
2   12.0   27.0
3    8.0    5.0
4   14.0    6.0
5    0.0    9.0
6   11.0   30.0
7   11.0   28.0
8   24.0    7.0
9  -11.0    1.0
10  21.0  130.0
11   2.0  133.0
12   0.0    2.0
13  22.0   22.0

df = data.agg({'col1':lambda x: x.diff(), 'col2':'mean'})
print (df)

ValueError: cannot perform both aggregation and transformation operations simultaneously

Upvotes: 1

mozway
mozway

Reputation: 260790

One easy option could be to use a dictionary to hold the periods:

periods = {'col1': 1, 'col2': 2}

data.apply(lambda c: c.diff(periods[c.name]))

output:

    col1   col2
0    NaN    NaN
1   10.0    NaN
2   12.0   27.0
3    8.0    5.0
4   14.0    6.0
...

Upvotes: 1

Related Questions