Reputation: 27
I've asked a similar question before for R, but I'm now trying to replicate the same task in python. The solution I got in this post is similar to the one I'm looking for.
Using sapply on column with missing values
Basically I need to create a new column conditionally based on grouped data.
Here's some sample data:
import pandas as pd
test = pd.DataFrame(data={"Group":[1,1,1,1,1,1,2,2,2,2,2,2],"time":
[0,1,2,3,4,5,0,1,2,3,4,5],"index":
[1,1.1,1.4,1.5,1.6,1.67,1,1.4,1.5,1.6,1.93,1.95]})
I now want to create a new column,"new_index", that will be equal to index before time 3, but will grow at a different rate starting from time 3, say by 10%. So now the data will look like
test2 = pd.DataFrame(data={"Group":[1,1,1,1,1,1,2,2,2,2,2,2],"time":
[0,1,2,3,4,5,0,1,2,3,4,5],"index":
[1,1.1,1.4,1.5,1.6,1.67,1,1.4,1.5,1.6,1.93,1.95],"new_index":
[1,1.1,1.4,1.54,1.694,1.8634,1,1.4,1.5,1.65,1.815,1.9965]})
I tried some code like this but it doesn't work
def gr_adj(df):
if df["time"] <= 2:
return df["index"]
else:
return np.cumprod(df["new_index"])
test["new_index] = test.groupby("Group",group_keys=False).apply(gr_adj)
Any help is greatly appreciated, thanks!
Upvotes: 1
Views: 69
Reputation: 5359
Here is another answer that actually increases your index by 10% if the time is > 3:
import pandas as pd
test = pd.DataFrame(data={"Group":[1,1,1,1,1,1,2,2,2,2,2,2],"time": [0,1,2,3,4,5,0,1,2,3,4,5],"index": [1,1.1,1.4,1.5,1.6,1.67,1,1.4,1.5,1.6,1.93,1.95]})
def gr_adj(row):
if row["time"] <= 2:
return row["index"]
else:
return row["index"] + (row["index"] * 0.1)
test["new_index"] = test.apply(gr_adj, axis=1)
outputs:
Group time index new_index
0 1 0 1.00 1.000
1 1 1 1.10 1.100
2 1 2 1.40 1.400
3 1 3 1.50 1.650
4 1 4 1.60 1.760
5 1 5 1.67 1.837
6 2 0 1.00 1.000
7 2 1 1.40 1.400
8 2 2 1.50 1.500
9 2 3 1.60 1.760
10 2 4 1.93 2.123
11 2 5 1.95 2.145
This uses your rows' values as input to the function and applies it to each row. It grows new index at a rate of index
+ 10% if time >= 2
.
Upvotes: 1
Reputation: 323226
Here is one way using cumprod
, 1st mask all index with time more than 3 as 1.1 , then we slice the output by not include the one we do not need update, then we groupby
get the cumprod
, then assign it back
s=test['index'].where(test['time']<3,1.1).loc[test['time']>=2].groupby(test['Group']).cumprod()
test.loc[test['time']>=2,'index']=s
test
Out[290]:
Group time index
0 1 0 1.0000
1 1 1 1.1000
2 1 2 1.4000
3 1 3 1.5400
4 1 4 1.6940
5 1 5 1.8634
6 2 0 1.0000
7 2 1 1.4000
8 2 2 1.5000
9 2 3 1.6500
10 2 4 1.8150
11 2 5 1.9965
Upvotes: 1