Reputation: 2573
In the following data frame DF, Users have different values for Movies and Exist columns. For example, user 2 has 10 values and User 5 has 9 values. I want the position of the first 'True' value for Exist column (relative to the user vector length) divided to the user vector length to be put in a separate data frame along with the User ID: Imagine this is the data frame:
User Movie Exist
0 2 172 False
1 2 2717 False
2 2 150 False
3 2 2700 False
4 2 2699 True
5 2 2616 False
6 2 112 False
7 2 2571 True
8 2 2657 True
9 2 2561 False
10 5 3471 False
11 5 187 False
12 5 2985 False
13 5 3388 False
14 5 3418 False
15 5 32 False
16 5 1673 False
17 5 3740 True
18 5 1693 False
So the target data frame should look like this:
5/10 =0.5
8/9= 0.88
User Location
2 0.5
5 0.88
As the first True value for user 2 is in the relative index 5 (5th value in user 2 vector) and the first True value for user 5 is in index 8 (8th value in the user 5 vector). Note that, I don't want the real index which are 4 and 17.
Upvotes: 3
Views: 93
Reputation: 294526
Option 1
def first_ratio(x):
x = x.reset_index(drop=True)
i = x.any() * (x.idxmax() + 1.)
l = len(x)
return i / l
df.groupby('User').Exist.apply(first_ratio).rename('Location').to_frame()
User
2 0.500000
5 0.888889
Name: Exist, dtype: float64
Option 2
def first_ratio(x):
v = x.values
i = v.any() * (v.argmax() + 1.)
l = v.shape[0]
return i / l
df.groupby('User').Exist.apply(first_ratio).rename('Location').to_frame()
Upvotes: 2