Elision
Elision

Reputation: 27

Applying conditional to grouped data

I've asked a similar question before for R, but I'm now trying to replicate the same task in python. The solution I got in this post is similar to the one I'm looking for.

Using sapply on column with missing values

Basically I need to create a new column conditionally based on grouped data.

Here's some sample data:

import pandas as pd

test = pd.DataFrame(data={"Group":[1,1,1,1,1,1,2,2,2,2,2,2],"time": 
[0,1,2,3,4,5,0,1,2,3,4,5],"index": 
[1,1.1,1.4,1.5,1.6,1.67,1,1.4,1.5,1.6,1.93,1.95]})

I now want to create a new column,"new_index", that will be equal to index before time 3, but will grow at a different rate starting from time 3, say by 10%. So now the data will look like

test2 = pd.DataFrame(data={"Group":[1,1,1,1,1,1,2,2,2,2,2,2],"time": 
[0,1,2,3,4,5,0,1,2,3,4,5],"index": 
[1,1.1,1.4,1.5,1.6,1.67,1,1.4,1.5,1.6,1.93,1.95],"new_index": 
[1,1.1,1.4,1.54,1.694,1.8634,1,1.4,1.5,1.65,1.815,1.9965]})

I tried some code like this but it doesn't work

def gr_adj(df):
    if df["time"] <= 2:
        return df["index"]
    else:
        return np.cumprod(df["new_index"])

test["new_index] = test.groupby("Group",group_keys=False).apply(gr_adj)

Any help is greatly appreciated, thanks!

Upvotes: 1

Views: 69

Answers (2)

d_kennetz
d_kennetz

Reputation: 5359

Here is another answer that actually increases your index by 10% if the time is > 3:

import pandas as pd

test = pd.DataFrame(data={"Group":[1,1,1,1,1,1,2,2,2,2,2,2],"time": [0,1,2,3,4,5,0,1,2,3,4,5],"index": [1,1.1,1.4,1.5,1.6,1.67,1,1.4,1.5,1.6,1.93,1.95]})

def gr_adj(row):
    if row["time"] <= 2:
        return row["index"]
    else:
        return row["index"] + (row["index"] * 0.1)

test["new_index"] = test.apply(gr_adj, axis=1)

outputs:

    Group  time  index  new_index
0       1     0   1.00      1.000
1       1     1   1.10      1.100
2       1     2   1.40      1.400
3       1     3   1.50      1.650
4       1     4   1.60      1.760
5       1     5   1.67      1.837
6       2     0   1.00      1.000
7       2     1   1.40      1.400
8       2     2   1.50      1.500
9       2     3   1.60      1.760
10      2     4   1.93      2.123
11      2     5   1.95      2.145

This uses your rows' values as input to the function and applies it to each row. It grows new index at a rate of index + 10% if time >= 2.

Upvotes: 1

BENY
BENY

Reputation: 323226

Here is one way using cumprod , 1st mask all index with time more than 3 as 1.1 , then we slice the output by not include the one we do not need update, then we groupby get the cumprod , then assign it back

s=test['index'].where(test['time']<3,1.1).loc[test['time']>=2].groupby(test['Group']).cumprod()
test.loc[test['time']>=2,'index']=s
test
Out[290]: 
    Group  time   index
0       1     0  1.0000
1       1     1  1.1000
2       1     2  1.4000
3       1     3  1.5400
4       1     4  1.6940
5       1     5  1.8634
6       2     0  1.0000
7       2     1  1.4000
8       2     2  1.5000
9       2     3  1.6500
10      2     4  1.8150
11      2     5  1.9965

Upvotes: 1

Related Questions