Vedda
Vedda

Reputation: 7445

Remove melt data based on condition

I'd like to remove any rows where the value of a >= b, but I'm not sure how to do this.

Sample data:

df <- data.frame(day = c(1, 1, 2, 2, 3, 3), var = c("a", "b", "a", "b", "a", "b"), value = c(1, 2, 3, 3, 2, 1)

Output:

  day var value
1   1   a     1
2   1   b     2
3   2   a     3
4   2   b     3
5   3   a     2
6   3   b     1

Desired output:

  day var value
1   1   a     1
2   1   b     2

Upvotes: 3

Views: 1116

Answers (2)

Shape
Shape

Reputation: 2952

here's a data.table solution for avoiding going from long to wide:

dt <- data.table(df)
dt[,if(value[var == 'a'] >= value[var == 'b']) .SD,by = day]

EDIT: I realize now that your desired output does not fit your initial inequality, so adjust inequality to match :)

EDIT2: if you don't want to do it in data.table, then here's the dplyr solution

df %>% group_by(day) %>% filter(value[var == 'a'] >= value[var == 'b'])

EDIT3: if you want to put NA's in then this

df %>% group_by(day) %>% mutate(value = if(value[var == 'a'] >= value[var == 'b']) as.numeric(NA) else value) 

EDIT4: NOTE this last solution appears to expose a bug, where NA's are handled strangely, see here:Why is dplyr removing values not met by condition?

Upvotes: 3

jangorecki
jangorecki

Reputation: 16697

Shape's answer is a correct approach to address your problem.
Just to extends Shape's answer I want to contribute with a little more generic solution.
An eav function in package dwtools is designed to address Entity-attribute-value data structures by easier calculation on measures. Function is defined below, you don't need dwtools package.
It calculates rm variable for each group. Formula for a calculations can be the same as quoted j arg to [.data.table call after melting your EAV, and before dcasting to EAV again.

library(data.table)
eav = function(x, j, id.vars = key(x)[-length(key(x))], variable.name = key(x)[length(key(x))], measure.vars = names(x)[!(names(x) %in% key(x))], fun.aggregate = sum, shift.on = character(), wide=FALSE){
    stopifnot(is.data.table(x))
    r <- x[,lapply(.SD,fun.aggregate),c(id.vars,variable.name),.SDcols=measure.vars
           ][,dcast(.SD,formula=as.formula(paste(paste(id.vars,collapse=' + '),paste(variable.name,collapse=' + '),sep=' ~ ')),fun.aggregate=fun.aggregate,value.var=measure.vars)
             ][,eval(j), by = eval(id.vars[!(id.vars %in% shift.on)])
               ]
    if(wide) r[] else melt(r,id.vars=id.vars, variable.name=variable.name, value.name=measure.vars)[,.SD,keyby=c(id.vars,variable.name)]
}

df = data.frame(day = c(1, 1, 2, 2, 3, 3), var = c("a", "b", "a", "b", "a", "b"), value = c(1, 2, 3, 3, 2, 1))
dt = as.data.table(df)
setkey(dt, day, var)
r = eav(dt, quote(rm := as.numeric(a >= b)))
print(r)
#   day var value
#1:   1   a     1
#2:   1   b     2
#3:   1  rm     0
#4:   2   a     3
#5:   2   b     3
#6:   2  rm     1
#7:   3   a     2
#8:   3   b     1
#9:   3  rm     1
r[, if(value[var=="rm"] == 0) .SD, by = day
  ][var!="rm"] # you need to exclude temporary variable
#   day var value
#1:   1   a     1
#2:   1   b     2

This solution may also be slower than Shape's (you can populate your sample of big data so it can be measured), but may be easier for complex computations on many measures in EAV, and supports shift'ing - see eav examples.

Upvotes: 3

Related Questions