How to specify condition that depends on a grouping variable in dplyr?

Question

I have the following dataset where I want to create a new column ("Attribute 2") and assign the values by another column ("Attribute 1") based on whether the sum of the value ("Value") equals 2 or more when grouped by the unique ID ("ID") .

For example, for ID = 1001 where:

Attribute 1 = Red, therefore Attribute 2 = Group 3
Attribute 1 = Blue & Value = 1 when grouped by ID, Attribute 2 = Group 1

For ID = 1002 where:

Attribute 1 = Red, therefore Attribute 2 = Group 3
Attribute 1 = Blue & Value >= 2 when grouped by ID, Attribute 2 = Group 2

I currently am using the following formula, but unsure how to to specify the "grouped by unique record" part.

a <- c("1001", "1001", "1002", "1002", "1002", "1003", "1004")
b <- c("Red", "Blue", "Red", "Blue", "Blue", "Green", "Yellow")
c <- c(NA, 1, NA, 1, 1, NA, NA)
df <- data.frame("ID" = a, "Attribute1" = b, "Value" = c)

df <- df %>%
   mutate(Attribute2 = case_when(
      Attribute1 == "Red" ~ "Group 3",
      Attribute1 == "Green" ~ "Group 3",
      Attribute1 == "Yellow" ~ "Group 3",
      Attribute1 == "Blue" & Value == 1 ~ "Group 1",
      Attribute1 == "Blue" & Value >= 2 ~ "Group 2"
   ))

ID	Attribute1	Value	Attribute2
1001	Red	NA	Group 3
1001	Blue	1	Group 1
1002	Red	NA	Group 3
1002	Blue	1	Group 2
1002	Blue	1	Group 2
1003	Green	NA	Group 3
1004	Yellow	NA	Group 3

Ronak Shah · Accepted Answer

Myabe this will help -

library(dplyr)

df %>%
  group_by(ID) %>%
  mutate(sum_val = sum(Value, na.rm = TRUE), 
         Attribute2 = case_when(
           Attribute1 %in% c("Red", "Green", "Yellow") ~ "Group 3",
           Attribute1 == "Blue" & sum_val == 1 ~ "Group 1",
           Attribute1 == "Blue" & sum_val >= 2 ~ "Group 2"
         )) %>%
  ungroup()

#    ID Attribute1 Value sum_val Attribute2
#                  
#1  1001 Red           NA       1 Group 3   
#2  1001 Blue           1       1 Group 1   
#3  1002 Red           NA       2 Group 3   
#4  1002 Blue           1       2 Group 2   
#5  1002 Blue           1       2 Group 2   
#6  1003 Green         NA       0 Group 3   
#7  1004 Yellow        NA       0 Group 3

How to specify condition that depends on a grouping variable in dplyr?

Answers (1)

Related Questions