Emily Fassbender
Emily Fassbender

Reputation: 423

duplicate id with two different values in second column

I am dealing with a dataset which looks like this below

 id    ColA     ColB
 1     No       Red
 1     Yes      Red
 2     No       Blue
 3     No       Blue
 3     No       Blue
 4     No       Red
 4     Yes      Red

I like to replace the values in ColA where

  - ID is duplicate { 1,3,4..}
  - ColB values are the same (id 1, Red,Red) but
  - ColA values are different(id1, yes, no)

If the above scenario is true then ColA should be replaced with Yes.

The final dataset should look like this below

  Id    ColA     ColB
  1     Yes      Red 
  1     Yes      Red
  2     No       Blue
  3     No       Blue
  3     No       Blue
  4     Yes      Red
  4     Yes      Red

Any suggestions or pointers are much appreciated.

Upvotes: 2

Views: 43

Answers (2)

jogo
jogo

Reputation: 12559

Here is a solution with data.table:

library("data.table")
DT <- fread(
" id    ColA     ColB
 1     No       Red
 1     Yes      Red
 2     No       Blue
 3     No       Blue
 3     No       Blue
 4     No       Red
 4     Yes      Red")
DT[, ColA:=ifelse(uniqueN(ColB)==1 && uniqueN(ColA)==2, "Yes", ColA) , by=id][]
# > DT[, ColA:=ifelse(uniqueN(ColB)==1 && uniqueN(ColA)==2, "Yes", ColA) , by=id][]
#    id ColA ColB
# 1:  1  Yes  Red
# 2:  1  Yes  Red
# 3:  2   No Blue
# 4:  3   No Blue
# 5:  3   No Blue
# 6:  4  Yes  Red
# 7:  4  Yes  Red

a variant without ifelse():

DT[, ColA := if (uniqueN(ColB)==1 && uniqueN(ColA)==2) "Yes", by=id][]
# > DT[, ColA := if (uniqueN(ColB)==1 && uniqueN(ColA)==2) "Yes", by=id][]
#    id ColA ColB
# 1:  1  Yes  Red
# 2:  1  Yes  Red
# 3:  2   No Blue
# 4:  3   No Blue
# 5:  3   No Blue
# 6:  4  Yes  Red
# 7:  4  Yes  Red

Upvotes: 1

akrun
akrun

Reputation: 887291

We can do

library(dplyr)
df1 %>%
   group_by(id, ColB) %>%
   mutate(ColA = if("Yes" %in% ColA) "Yes" else "No")

Upvotes: 1

Related Questions