Reputation: 7445
I'd like to remove any rows where the value
of a
>= b
, but I'm not sure how to do this.
Sample data:
df <- data.frame(day = c(1, 1, 2, 2, 3, 3), var = c("a", "b", "a", "b", "a", "b"), value = c(1, 2, 3, 3, 2, 1)
Output:
day var value
1 1 a 1
2 1 b 2
3 2 a 3
4 2 b 3
5 3 a 2
6 3 b 1
Desired output:
day var value
1 1 a 1
2 1 b 2
Upvotes: 3
Views: 1116
Reputation: 2952
here's a data.table solution for avoiding going from long to wide:
dt <- data.table(df)
dt[,if(value[var == 'a'] >= value[var == 'b']) .SD,by = day]
EDIT: I realize now that your desired output does not fit your initial inequality, so adjust inequality to match :)
EDIT2: if you don't want to do it in data.table, then here's the dplyr solution
df %>% group_by(day) %>% filter(value[var == 'a'] >= value[var == 'b'])
EDIT3: if you want to put NA's in then this
df %>% group_by(day) %>% mutate(value = if(value[var == 'a'] >= value[var == 'b']) as.numeric(NA) else value)
EDIT4: NOTE this last solution appears to expose a bug, where NA's are handled strangely, see here:Why is dplyr removing values not met by condition?
Upvotes: 3
Reputation: 16697
Shape's answer is a correct approach to address your problem.
Just to extends Shape's answer I want to contribute with a little more generic solution.
An eav function in package dwtools is designed to address Entity-attribute-value data structures by easier calculation on measures. Function is defined below, you don't need dwtools package.
It calculates rm
variable for each group. Formula for a calculations can be the same as quoted j
arg to [.data.table
call after melting your EAV, and before dcasting to EAV again.
library(data.table)
eav = function(x, j, id.vars = key(x)[-length(key(x))], variable.name = key(x)[length(key(x))], measure.vars = names(x)[!(names(x) %in% key(x))], fun.aggregate = sum, shift.on = character(), wide=FALSE){
stopifnot(is.data.table(x))
r <- x[,lapply(.SD,fun.aggregate),c(id.vars,variable.name),.SDcols=measure.vars
][,dcast(.SD,formula=as.formula(paste(paste(id.vars,collapse=' + '),paste(variable.name,collapse=' + '),sep=' ~ ')),fun.aggregate=fun.aggregate,value.var=measure.vars)
][,eval(j), by = eval(id.vars[!(id.vars %in% shift.on)])
]
if(wide) r[] else melt(r,id.vars=id.vars, variable.name=variable.name, value.name=measure.vars)[,.SD,keyby=c(id.vars,variable.name)]
}
df = data.frame(day = c(1, 1, 2, 2, 3, 3), var = c("a", "b", "a", "b", "a", "b"), value = c(1, 2, 3, 3, 2, 1))
dt = as.data.table(df)
setkey(dt, day, var)
r = eav(dt, quote(rm := as.numeric(a >= b)))
print(r)
# day var value
#1: 1 a 1
#2: 1 b 2
#3: 1 rm 0
#4: 2 a 3
#5: 2 b 3
#6: 2 rm 1
#7: 3 a 2
#8: 3 b 1
#9: 3 rm 1
r[, if(value[var=="rm"] == 0) .SD, by = day
][var!="rm"] # you need to exclude temporary variable
# day var value
#1: 1 a 1
#2: 1 b 2
This solution may also be slower than Shape's (you can populate your sample of big data so it can be measured), but may be easier for complex computations on many measures in EAV, and supports shift'ing - see eav examples.
Upvotes: 3