Reputation: 89
Hi I stumble upon on problem with pandas dataframe. I have this dataframe.
user event diff days
1001 request 45
1001 approve 43
1002 request 44
1002 approve 43
1002 request 42
1003 approve 41
1003 request 40
1003 approve 39
1003 request 38
I would like to create new column Flag based this rule: we define Problem as not making another request within 45 days after last request for each user.
The output should look like this.
user event diff Days flag
1001 request 45 Problem
1001 approve 43 Problem
1002 request 44 NoProblem
1002 approve 43 NoProblem
1002 request 42 NoProblem
1003 approve 41 NoProblem
1003 request 40 NoProblem
1003 approve 39 NoProblem
1003 request 38 NoProblem
I cannot come up with solution using np.where or np.select. Any kinds of solution are welcome.
In case you want to play around with this, you can use this to reproduce my dataframe above.
df = pd.DataFrame(data={'user': [1001, 1001, 1002, 1002, 1002, 1003, 1003, 1003, 1003],
'event': ['request', 'approve', 'request', 'approve', 'request', 'approve', 'request', 'approve', 'request'],
'Diff Days': [45, 43, 44, 43, 42, 41, 40, 39, 38]})
N.B. I need to flag as problem only cases where we are not making another request within 45 days after last request (column diff days) for each user and event(I imply only request). The rest of fields from event do not play a role.
Upvotes: 2
Views: 2395
Reputation: 6333
EDIT:
In the comments, you said you want to flag users where event
is request
and Diff Days >= 45
. Here's how:
users_with_problems = df.loc[(df["event"] == "request") & (df["Diff Days"] >= 45), "users"]
df["Flag"] = "No problem"
df.loc[df["user"].isin(users_with_problem), "Flag"] = "Problem"
END OF EDIT.
It seems you want Flag
to display Problem
across all the rows of a user who had a single problem. To do this:
df["Flag"] = "No problem"
users_with_problem = df.loc[df["Diff Days"] >= 45, "user"]
df.loc[df["user"].isin(users_with_problem), "Flag"] = "Problem"
This returns:
user event Diff Days Flag
1001 request 45 Problem
1001 approve 43 Problem
1002 request 44 No problem
1002 approve 43 No problem
1002 request 42 No problem
1003 approve 41 No problem
1003 request 40 No problem
1003 approve 39 No problem
1003 request 38 No problem
However, you said you only want to flag a problem if diff
is 45 or more. You can do this as follows:
df["Flag"] = "No Problem"
df.loc[df["Diff Days"] >= 45, "Flag"] = "Problem"
This will only flag the users in the rows where diff >= 45
:
user event Diff Days Flag
1001 request 45 Problem
1001 approve 43 No problem
1002 request 44 No problem
1002 approve 43 No problem
1002 request 42 No problem
1003 approve 41 No problem
1003 request 40 No problem
1003 approve 39 No problem
1003 request 38 No problem
Upvotes: 5