Reputation: 485
Lets say I have something that looks like this
df = pd.DataFrame({'Event':['A','A','A','A', 'A' ,'B','B','B','B','B'], 'Number':[1,2,3,4,5,6,7,8,9,10],'Ref':[False,False,False,False,True,False,False,False,True,False]})
What I want to do is create a new column which is the difference in Number from the True in ref. So for the A group, the True is the last one, so the column would read -4,-3,-2,-1,0. I have been thinking to do the following:
for col in df.groupby('Event'):
temp = col[1]
reference = temp[temp.Ref==True]
dist1 = temp.apply(lambda x:x.Number-reference.Number,axis=1)
This seems to correctly calculate for each group, but I am not sure how to join the result into the df.
Upvotes: 3
Views: 390
Reputation: 323226
In your case
df['new']=(df.set_index('Event').Number-df.query('Ref').set_index('Event').Number).to_numpy()
df
Event Number Ref new
0 A 1 False -4
1 A 2 False -3
2 A 3 False -2
3 A 4 False -1
4 A 5 True 0
5 B 6 False -3
6 B 7 False -2
7 B 8 False -1
8 B 9 True 0
9 B 10 False 1
Upvotes: 3
Reputation: 25239
Try where
and grouby transform first
s = df.Number.where(df.Ref).groupby(df.Event).transform('first')
df.Number - s
Out[319]:
0 -4.0
1 -3.0
2 -2.0
3 -1.0
4 0.0
5 -3.0
6 -2.0
7 -1.0
8 0.0
9 1.0
Name: Number, dtype: float64
Upvotes: 1
Reputation: 61900
You could do the following:
df["new"] = df.Number - df.Number[df.groupby('Event')['Ref'].transform('idxmax')].reset_index(drop=True)
print(df)
Output
Event Number Ref new
0 A 1 False -4
1 A 2 False -3
2 A 3 False -2
3 A 4 False -1
4 A 5 True 0
5 B 6 False -3
6 B 7 False -2
7 B 8 False -1
8 B 9 True 0
9 B 10 False 1
This: df.groupby('Event')['Ref'].transform('idxmax')
fill find the indices by group where Ref is True
. Basically it finds the indices of the max values, so given that True = 1, and False = 0, it find the indices of the True values.
Upvotes: 2