Efficient way of conditional value setting

Question

I have a pandas DataFrame of operations in my application:

+----------+-------------+
| UserName |  StartEdit  |
+----------+-------------+
| John     | 12-Jul-2015 |
| David    | 16-Aug-2015 |
| Katie    | 20-Aug-2015 |
| Cristin  | 2-Sep-2015  |
| Katie    | 12-Sep-2015 |
| John     | 23-Nov-2015 |
| David    | 2-Jan-2016  |
| David    | 3-Jan-2016  |
| John     | 10-Feb-2016 |
| Steven   | 13-Mar-2016 |
| Steven   | 14-Mar-2016 |
+----------+-------------+

I would like to create another column with UserTeam. I know that Katie, Cristin and Steven have always been on the same team:

owners_teams = {"Katie":"A", "Cristin":"B", "Steven":"C"}

So when I do df["UserTeam"] = df["UserName"].map(owners_teams) I get:

+----------+-------------+----------+
| UserName |  StartEdit  | UserTeam |
+----------+-------------+----------+
| John     | 12-Jul-2015 | NaN      |
| David    | 16-Aug-2015 | NaN      |
| Katie    | 20-Aug-2015 | A        |
| Cristin  | 2-Sep-2015  | B        |
| Katie    | 12-Sep-2015 | A        |
| John     | 23-Nov-2015 | NaN      |
| David    | 2-Jan-2016  | NaN      |
| David    | 3-Jan-2016  | NaN      |
| John     | 10-Feb-2016 | NaN      |
| Steven   | 13-Mar-2016 | C        |
| Steven   | 14-Mar-2016 | C        |
+----------+-------------+----------+

Now, I also know that:

John moved from A to C on 01-Jan-2016

David moved from B to C on 12-Dec-2015

changes = [("John", "01-Jan-2016", "A", "C"), ("David", "12-Dec-2015", "B", "C")]

I know how to do it with apply and looping over the rows and hardcoding all rules but I don't think it's efficient. How do I do it in a vectorized way for a big number of users?

Expected result:

+----------+-------------+----------+
| UserName |  StartEdit  | UserTeam |
+----------+-------------+----------+
| John     | 12-Jul-2015 | A        |
| David    | 16-Aug-2015 | B        |
| Katie    | 20-Aug-2015 | A        |
| Cristin  | 2-Sep-2015  | B        |
| Katie    | 12-Sep-2015 | A        |
| John     | 23-Nov-2015 | A        |
| David    | 2-Jan-2016  | C        |
| David    | 3-Jan-2016  | C        |
| John     | 10-Feb-2016 | C        |
| Steven   | 13-Mar-2016 | C        |
| Steven   | 14-Mar-2016 | C        |
+----------+-------------+----------+

jpp · Accepted Answer

One way is to use pd.DataFrame.loc within a for loop. In terms of efficiency, you should test and see whether this is performant for your use case.

changes = [("John", "01-Jan-2016", "A", "C"), ("David", "12-Dec-2015", "B", "C")]

for name, date, before, after in changes:
    name_mask = df['UserName'] == name
    df.loc[name_mask & (df['StartEdit'] < date), 'UserTeam'] = before
    df.loc[name_mask & (df['StartEdit'] >= date), 'UserTeam'] = after

You can also perform the equivalent mapping via numpy.where.

Efficient way of conditional value setting

Answers (2)

`pd.merge_asof`

Related Questions