maribio18
maribio18

Reputation: 85

Identify the groups sharing a duplicated value

I have a data frame summarizing the deployment history of different pieces of equipment identified by a serial number. Over time, the equipment (serno) can be used by different projects but can only be deployed on one project at any given time. I generated a new column called used.elsewhere that identifies if the serno in that row was duplicated within the df. For example:

project = c("a","b","c","c","c","d","e")
serno = c(1,2,2,2,3,3,3)
deployed = c(T,T,F,F,F,F,T)
used.elsewhere = c(F,T,T,T,T,T,T)
data.frame(project,serno,deployed,used.elsewhere)

  project serno deployed used.elsewhere
1       a     1     TRUE          FALSE
2       b     2     TRUE           TRUE
3       c     2    FALSE           TRUE
4       c     2    FALSE           TRUE
5       c     3    FALSE           TRUE
6       d     3    FALSE           TRUE
7       e     3     TRUE           TRUE

I want to generate a new column that, if the serno value is not deployed and is used elsewhere, indicates the project where the serno is deployed:

project = c("a","b","c","c","c","d","e")
serno = c(1,2,2,2,3,3,3)
deployed = c(T,T,F,F,F,F,T)
used.elsewhere = c(F,T,T,T,T,T,T)
other.project = c(NA, NA, "b", "b", "e", "e", NA)

  project serno deployed used.elsewhere other.project
1       a     1     TRUE          FALSE          <NA>
2       b     2     TRUE           TRUE          <NA>
3       c     2    FALSE           TRUE             b
4       c     2    FALSE           TRUE             b
5       c     3    FALSE           TRUE             e
6       d     3    FALSE           TRUE             e
7       e     3     TRUE           TRUE          <NA>

I'm assuming that I could use an ifelse statement such as the following but I'm not sure how to finish it.

df %>%
  mutate(other.project = ifelse(deployed == F & used.elsewhere == T, ...

Thank-you in advance!

Upvotes: 1

Views: 46

Answers (2)

Ben
Ben

Reputation: 30494

If you first group_by(serno), you can include project where deployed is TRUE within the same group.

library(dplyr)

df %>%
  group_by(serno) %>%
  mutate(other.project = ifelse(
    deployed == FALSE & used.elsewhere == TRUE,
    project[deployed],
    NA
  ))

Output

  project serno deployed used.elsewhere other.project
  <chr>   <dbl> <lgl>    <lgl>          <chr>        
1 a           1 TRUE     FALSE          NA           
2 b           2 TRUE     TRUE           NA           
3 c           2 FALSE    TRUE           b            
4 c           2 FALSE    TRUE           b            
5 c           3 FALSE    TRUE           e            
6 d           3 FALSE    TRUE           e            
7 e           3 TRUE     TRUE           NA 

Upvotes: 2

matt_jay
matt_jay

Reputation: 1271

I suggest building a mapping of deployments and then just joining it with your df:

library(dplyr)
deployments <- df %>% 
  filter(deployed == TRUE) %>% 
  select(serno, other.project = project)

df %>% 
  left_join(deployments)

You'd then only still have to make sure you have the NAs in the other.project column where you want them (i.e., by recoding where other.project == project).

Upvotes: 1

Related Questions