Reputation: 31
Being this my initial dataset:
x <- c("a","a","b","b","c","c","d","d")
y <- c("a","a","a","b","c","c", "d", "d")
z <- c(5,1,2,6,1,1,5,6)
df <- data.frame(x,y,z)
I am trying to create a column in a dataframe to flag if there is another row in the dataset with the following condition:
With the example provided, the output should be:
x y z flag
1 a a 5 TRUE
2 a a 1 TRUE
3 b a 2 FALSE
4 b b 6 TRUE
5 c c 1 FALSE
6 c c 1 FALSE
7 d d 5 TRUE
8 d d 6 TRUE
Thank you!
Upvotes: 0
Views: 334
Reputation: 2727
I use data.table
package for all my aggregations. With this package I would do the following:
library(data.table)
dt <- as.data.table(df)
# by=.(x, y): grouping by x and y
# find all cases where
# 1. the maximum z value is >= 5
# 2. there are more than 1 entry for that (x, y) combo. .N is a data.table syntax for number of rows in that group
# := is a data.table syntax to assign back in to the original data.table
dt[, flag := max(z) >= 5 & .N > 1, by=.(x, y)]
# Does x need to equal y? If so use this
dt[, flag := max(z) >= 5 & .N > 1 & x == y, by=.(x, y)]
# view the result
dt[]
# return back to df
df <- as.data.frame(dt)
df
Upvotes: 2
Reputation: 101373
You can try the code below
> within(df, flag <- x==y & z>=5)
x y z flag
1 a a 5 TRUE
2 a a 1 FALSE
3 b a 2 FALSE
4 b b 6 TRUE
5 c c 1 FALSE
6 c c 1 FALSE
7 d d 5 TRUE
8 d d 6 TRUE
Upvotes: 0