Reputation: 11
I have an R data frame that contains many observations and looks like:
df <- data.frame(obs1=c(7.1,8.3,9.8),
obs2=c(5.2,8.8,4.1),
obs3=c(9.6,8.1,7.7),
obs4=c(7.2,8.1,9.4),
obs5=c(NA,5.4,9.0),
hi1=c(9.6,8.8,9.8),
hi2=c(7.2,8.3,9.4))
I simplified, as obs goes out to obs25. hi1 and hi2 contain the highest and next highest values in each row. I need to get all the rows with obs* > x
but less than hi1 or hi2. In other words, all the rows that have values above a threshold but were not the 2 highest values. Thanks!
Sorry for not being more clear. For example, if the threshold is set at 8 and the above dataframe is used the result would be rows 2 and 3:
in row 2, obs3 and obs4 are > 8, but less than 2 highest
in row 3, obs5 > 8, but less than 2 highest
Upvotes: 0
Views: 1398
Reputation: 1581
I want to explore one possibility here, and that's to find columns that comparatively are less than the two highest columns, each by their respective row element, and less than a threshold which is also a vector to compare element by element. The below example adds a few observations so we can actually see some results come through:
df <- data.frame(obs1=c(7.1,8.3,9.8),
obs2=c(5.2,8.8,4.1),
obs3=c(9.6,8.1,7.7),
obs4=c(7.2,8.1,9.4),
obs5=c(NA,5.4,9.0),
obs6=c(6.6,7.3,8.8),
obs7=c(1.1,6.7,9.0),
obs8=c(8.8,8.4,9.6),
obs9=c(6.0,7.8,8.3),
hi1=c(9.6,8.8,9.8),
hi2=c(7.2,8.3,9.4))
x is our threshold, which is a vector
x <- c(5.0,5.0,5.0)
now we apply on each column a comparison to the lowest of the two hi columns, combined with a comparison to the threshold. The logical vector is then run as a product, so that a 1 is only reported if all elements are TRUE. e is a logical vector of the columns we want to show.
e <- as.logical(sapply(df, function(y) prod(ifelse(y < df$hi2 & y > x,TRUE,FALSE))>0))
subset our df by column
dfnew <- df[,which(e)]
So if I look at the end result:
dfnew
obs6 obs9
1 6.6 6.0
2 7.3 7.8
3 8.8 8.3
Upvotes: 0
Reputation: 2085
Note that there are no rows meeting the criteria you seem to describe (in this example):
df <- data.frame(obs1=c(7.1,8.3,9.8),
obs2=c(5.2,8.8,4.1),
obs3=c(9.6,8.1,7.7),
obs4=c(7.2,8.1,9.4),
obs5=c(NA,5.4,9.0),
hi1=c(9.6,8.8,9.8),
hi2=c(7.2,8.3,9.4))
x <- 5
#rows which have a min value greater than x
df[which(apply(df[,-c(6:7)], 1, min) > x,),]
#rows which have a max value less than h2
df[which(apply(df[,-c(6:7)], 1, max) < df$h12,),]
#rows which have both
df[intersect(which(apply(df[,-c(6:7)], 1, min) > x,), which(apply(df[,-c(6:7)], 1, max) < df$h12,)),]
Upvotes: 1