Amelia
Amelia

Reputation: 29

How to create a column based on two conditions from other data frame?

I'm trying to create a column that identifies if the row meets two conditions. For example, I have a table similar to this:

> dat <- data.frame(Date = c(rep(c("2019-01-01", "2019-02-01","2019-03-01", "2019-04-01"), 4)),
+                   Rep = c(rep("Mike", 4), rep("Tasha", 4), rep("Dane", 4), rep("Trish", 4)),
+                   Manager = c(rep("Amber", 2), rep("Michelle", 2), rep("Debbie", 4), rep("Brian", 4), rep("Tim", 3), "Trevor"),
+                   Sales = floor(runif(16, min = 0, max = 10)))
> dat                  
         Date   Rep  Manager Sales
1  2019-01-01  Mike    Amber     6
2  2019-02-01  Mike    Amber     3
3  2019-03-01  Mike Michelle     9
4  2019-04-01  Mike Michelle     2
5  2019-01-01 Tasha   Debbie     9
6  2019-02-01 Tasha   Debbie     6
7  2019-03-01 Tasha   Debbie     0
8  2019-04-01 Tasha   Debbie     4
9  2019-01-01  Dane    Brian     3
10 2019-02-01  Dane    Brian     6
11 2019-03-01  Dane    Brian     6
12 2019-04-01  Dane    Brian     1
13 2019-01-01 Trish      Tim     6
14 2019-02-01 Trish      Tim     7
15 2019-03-01 Trish      Tim     6
16 2019-04-01 Trish   Trevor     1

Out of the Reps that have switched manager, I would like to identify weather this manager is the first or the second manager with respect to the date. The ideal output would look something like:

         Date   Rep  Manager Sales  New_Column
1  2019-01-01  Mike    Amber     6           1
2  2019-02-01  Mike    Amber     3           1
3  2019-03-01  Mike Michelle     9           2
4  2019-04-01  Mike Michelle     2           2
5  2019-01-01 Trish      Tim     6           1
6  2019-02-01 Trish      Tim     7           1
7  2019-03-01 Trish      Tim     6           1
8  2019-04-01 Trish   Trevor     1           2

I have tried a few things but they're not quite working out. I have created two separate data frames where one consists of the first instance of that Rep and associated manager (df1) and the other one consists of the last instance of that rep and associated manager (df2). The code that I have tried that has gotten the closest is:

dat$New_Column <- ifelse(dat$Rep %in% df1$Rep & dat$Manager %in% df1$Manager, 1,
                         ifelse(dat$Rep %in% df2$Rep & dat$Manager %in% df2$Manager, 2, NA))

However this reads as two separate conditions, rather than having a condition of a condition (i.e. If Mike exists in the first instance and Amber exists in the first instance assign 1 rather than If Mike exists with the manager Amber in the first instance assign 1). Any help would be really appreciated. Thank you!

Upvotes: 1

Views: 36

Answers (1)

akrun
akrun

Reputation: 887771

An option is to first grouped by 'Rep' filter the rows where the number of unique 'Manager' is 2, and then add a column by matching the 'Manager' with the unique elements of 'Manager' to get the indices

library(dplyr)
dat %>% 
  group_by(Rep) %>% 
  filter(n_distinct(Manager) == 2) %>%
  mutate(New_Column = match(Manager, unique(Manager)))
# A tibble: 8 x 5
# Groups:   Rep [2]
#  Date       Rep   Manager  Sales New_Column
#  <chr>      <chr> <chr>    <int>      <int>
#1 2019-01-01 Mike  Amber        6          1
#2 2019-02-01 Mike  Amber        3          1
#3 2019-03-01 Mike  Michelle     9          2
#4 2019-04-01 Mike  Michelle     2          2
#5 2019-01-01 Trish Tim          6          1
#6 2019-02-01 Trish Tim          7          1
#7 2019-03-01 Trish Tim          6          1
#8 2019-04-01 Trish Trevor       1          2

Upvotes: 1

Related Questions