Reputation: 185
I have the following dataset where I want to create a new column ("Attribute 2") and assign the values by another column ("Attribute 1") based on whether the sum of the value ("Value") equals 2 or more when grouped by the unique ID ("ID") .
For example, for ID = 1001 where:
For ID = 1002 where:
I currently am using the following formula, but unsure how to to specify the "grouped by unique record" part.
a <- c("1001", "1001", "1002", "1002", "1002", "1003", "1004")
b <- c("Red", "Blue", "Red", "Blue", "Blue", "Green", "Yellow")
c <- c(NA, 1, NA, 1, 1, NA, NA)
df <- data.frame("ID" = a, "Attribute1" = b, "Value" = c)
df <- df %>%
mutate(Attribute2 = case_when(
Attribute1 == "Red" ~ "Group 3",
Attribute1 == "Green" ~ "Group 3",
Attribute1 == "Yellow" ~ "Group 3",
Attribute1 == "Blue" & Value == 1 ~ "Group 1",
Attribute1 == "Blue" & Value >= 2 ~ "Group 2"
))
ID | Attribute1 | Value | Attribute2 |
---|---|---|---|
1001 | Red | NA | Group 3 |
1001 | Blue | 1 | Group 1 |
1002 | Red | NA | Group 3 |
1002 | Blue | 1 | Group 2 |
1002 | Blue | 1 | Group 2 |
1003 | Green | NA | Group 3 |
1004 | Yellow | NA | Group 3 |
Upvotes: 0
Views: 772
Reputation: 389225
Myabe this will help -
library(dplyr)
df %>%
group_by(ID) %>%
mutate(sum_val = sum(Value, na.rm = TRUE),
Attribute2 = case_when(
Attribute1 %in% c("Red", "Green", "Yellow") ~ "Group 3",
Attribute1 == "Blue" & sum_val == 1 ~ "Group 1",
Attribute1 == "Blue" & sum_val >= 2 ~ "Group 2"
)) %>%
ungroup()
# ID Attribute1 Value sum_val Attribute2
# <int> <chr> <int> <int> <chr>
#1 1001 Red NA 1 Group 3
#2 1001 Blue 1 1 Group 1
#3 1002 Red NA 2 Group 3
#4 1002 Blue 1 2 Group 2
#5 1002 Blue 1 2 Group 2
#6 1003 Green NA 0 Group 3
#7 1004 Yellow NA 0 Group 3
Upvotes: 0