Flag based on multiple conditions

Question

Being this my initial dataset:

x <- c("a","a","b","b","c","c","d","d")
y <- c("a","a","a","b","c","c", "d", "d")
z <- c(5,1,2,6,1,1,5,6)
df <- data.frame(x,y,z)

I am trying to create a column in a dataframe to flag if there is another row in the dataset with the following condition:

There is a row in the dataset with the same "x" and "y" columns. And at least 1 of the rows of the dataset, with that "x" and "y" has a "z" value >= 5

With the example provided, the output should be:

x y z  flag
1 a a 5  TRUE
2 a a 1  TRUE
3 b a 2 FALSE
4 b b 6 TRUE
5 c c 1 FALSE
6 c c 1 FALSE
7 d d 5  TRUE
8 d d 6  TRUE

Thank you!

Jonny Phelps · Accepted Answer

I use data.table package for all my aggregations. With this package I would do the following:

library(data.table)
dt <- as.data.table(df)
# by=.(x, y): grouping by x and y
# find all cases where
# 1. the maximum z value is >= 5
# 2. there are more than 1 entry for that (x, y) combo. .N is a data.table syntax for number of rows in that group
# := is a data.table syntax to assign back in to the original data.table
dt[, flag := max(z) >= 5 & .N > 1, by=.(x, y)]

# Does x need to equal y? If so use this 
dt[, flag := max(z) >= 5 & .N > 1 & x == y, by=.(x, y)]

# view the result
dt[]

# return back to df
df <- as.data.frame(dt)
df

Flag based on multiple conditions

Answers (2)

Related Questions