CrisXP
CrisXP

Reputation: 31

Flag based on multiple conditions

Being this my initial dataset:

x <- c("a","a","b","b","c","c","d","d")
y <- c("a","a","a","b","c","c", "d", "d")
z <- c(5,1,2,6,1,1,5,6)
df <- data.frame(x,y,z)

I am trying to create a column in a dataframe to flag if there is another row in the dataset with the following condition:

With the example provided, the output should be:

x y z  flag
1 a a 5  TRUE
2 a a 1  TRUE
3 b a 2 FALSE
4 b b 6 TRUE
5 c c 1 FALSE
6 c c 1 FALSE
7 d d 5  TRUE
8 d d 6  TRUE

Thank you!

Upvotes: 0

Views: 334

Answers (2)

Jonny Phelps
Jonny Phelps

Reputation: 2727

I use data.table package for all my aggregations. With this package I would do the following:

library(data.table)
dt <- as.data.table(df)
# by=.(x, y): grouping by x and y
# find all cases where
# 1. the maximum z value is >= 5
# 2. there are more than 1 entry for that (x, y) combo. .N is a data.table syntax for number of rows in that group
# := is a data.table syntax to assign back in to the original data.table
dt[, flag := max(z) >= 5 & .N > 1, by=.(x, y)]

# Does x need to equal y? If so use this 
dt[, flag := max(z) >= 5 & .N > 1 & x == y, by=.(x, y)]

# view the result
dt[]

# return back to df
df <- as.data.frame(dt)
df

Upvotes: 2

ThomasIsCoding
ThomasIsCoding

Reputation: 101373

You can try the code below

> within(df, flag <- x==y & z>=5)
  x y z  flag
1 a a 5  TRUE
2 a a 1 FALSE
3 b a 2 FALSE
4 b b 6  TRUE
5 c c 1 FALSE
6 c c 1 FALSE
7 d d 5  TRUE
8 d d 6  TRUE

Upvotes: 0

Related Questions