vasiop
vasiop

Reputation: 89

Create new column based on rule Pandas dataframe

Hi I stumble upon on problem with pandas dataframe. I have this dataframe.

user    event   diff days
1001    request    45
1001    approve    43
1002    request    44
1002    approve    43
1002    request    42
1003    approve    41
1003    request    40
1003    approve    39
1003    request    38

I would like to create new column Flag based this rule: we define Problem as not making another request within 45 days after last request for each user.

The output should look like this.

user    event   diff Days   flag
1001    request    45       Problem
1001    approve    43       Problem
1002    request    44       NoProblem
1002    approve    43       NoProblem
1002    request    42       NoProblem
1003    approve    41       NoProblem
1003    request    40       NoProblem
1003    approve    39       NoProblem
1003    request    38       NoProblem

I cannot come up with solution using np.where or np.select. Any kinds of solution are welcome.

In case you want to play around with this, you can use this to reproduce my dataframe above.

df = pd.DataFrame(data={'user': [1001, 1001, 1002, 1002, 1002, 1003, 1003, 1003, 1003],
                           'event': ['request', 'approve', 'request', 'approve', 'request', 'approve', 'request', 'approve', 'request'],
                           'Diff Days': [45, 43, 44, 43, 42, 41, 40, 39, 38]})

N.B. I need to flag as problem only cases where we are not making another request within 45 days after last request (column diff days) for each user and event(I imply only request). The rest of fields from event do not play a role.

Upvotes: 2

Views: 2395

Answers (1)

Arturo Sbr
Arturo Sbr

Reputation: 6333

EDIT:

In the comments, you said you want to flag users where event is request and Diff Days >= 45. Here's how:

users_with_problems = df.loc[(df["event"] == "request") & (df["Diff Days"] >= 45), "users"]
df["Flag"] = "No problem"
df.loc[df["user"].isin(users_with_problem), "Flag"] = "Problem"

END OF EDIT.

It seems you want Flag to display Problem across all the rows of a user who had a single problem. To do this:

df["Flag"] = "No problem"
users_with_problem = df.loc[df["Diff Days"] >= 45, "user"]
df.loc[df["user"].isin(users_with_problem), "Flag"] = "Problem"

This returns:

user    event  Diff Days        Flag
1001  request         45     Problem
1001  approve         43     Problem
1002  request         44  No problem
1002  approve         43  No problem
1002  request         42  No problem
1003  approve         41  No problem
1003  request         40  No problem
1003  approve         39  No problem
1003  request         38  No problem

However, you said you only want to flag a problem if diff is 45 or more. You can do this as follows:

df["Flag"] = "No Problem"
df.loc[df["Diff Days"] >= 45, "Flag"] = "Problem"

This will only flag the users in the rows where diff >= 45:

user    event  Diff Days        Flag
1001  request         45     Problem
1001  approve         43  No problem
1002  request         44  No problem
1002  approve         43  No problem
1002  request         42  No problem
1003  approve         41  No problem
1003  request         40  No problem
1003  approve         39  No problem
1003  request         38  No problem

Upvotes: 5

Related Questions