Fedorenko Kristina
Fedorenko Kristina

Reputation: 2767

How to select rows from a data frame using a condition from many columns in R

I have a data frame with 800 columns. I want to select rows from a data frame using a condition from every column. How can I do that without huge long which like data[which(data$V_1 < bound_1 & ...& data$V_n<bound_n),]

This is a fragment of my data frame

    type_Browser os_name_Windows XP ua_family_Chrome ua_name_Chrome0
 [1,]   0.06453172         0.09318651       0.09849316        0.1962756
 [2,]   0.06453172         0.09318651       0.09849316        0.1962756
 [3,]   0.06453172         0.09318651       0.00000000        0.0000000
 [4,]   0.06453172         0.00000000       0.00000000        0.0000000
 [5,]   0.06453172         0.00000000       0.09849316        0.1962756
 [6,]   0.06453172         0.09318651       0.00000000        0.0000000
 [7,]   0.06453172         0.00000000       0.00000000        0.0000000
 [8,]   0.06453172         0.09318651       0.00000000        0.0000000
 [9,]   0.06453172         0.00000000       0.09849316        0.1962756
[10,]   0.06453172         0.09318651       0.00000000        0.0000000

This is a fragment of centers of clusters after kmeans

type_Browser os_name_Windows XP ua_family_Chrome ua_name_Chrome 0
    1     0.9973870          0.9014791        0.8885468        0.9162910
    2     0.1370203          0.9323763        0.3940263        0.8250081
    3     0.7121533          0.9541988        0.1418068        0.6568214
    4     0.9998909          0.9881944        0.9959341        0.3181853
    5     0.9278844          0.9796447        0.9247542        0.9510941
    6     0.9784205          0.8586415        0.8902691        0.8210114
    7     0.7115432          0.9930360        0.9652756        0.9735471
    8     0.9907865          0.9896360        0.9910279        0.9781258
    9     0.9967735          0.9919486        0.9921240        0.9702438
    10    0.9998825          0.9940538        0.9970676        0.9839453

Then I make two bounds

lowerBound = centers - eps;
upperBound = centers + eps;

Then I want to select rows which lies in [ centers - eps, centers + eps ].

for(i in 1:k){
  ithLB = lowerBound[i,];
  ithUB = upperBound[i,];
  ithKernel <- data[ which(data[,1]<=lowerBound[1] & ...& which(data[,812]<=lowerBound[812],] # I want to change this expression for something more reasonable.
}

Upvotes: 2

Views: 274

Answers (1)

akrun
akrun

Reputation: 886938

You could try

data[Reduce(`&`,Map('<', data, bound)),]

Suppose there is "bound_1", "bound_2", ..."bound_N" objects

 bound <- mget(paste('bound', 1:ncol(data), sep="_"))

and use the same code as above

Another less optimal option would be using paste with eval(parse (not recommended)

str1 <- paste(paste(paste0('data$',paste('V', 1:ncol(data), sep="_")),
  paste('bound', 1:ncol(data), sep="_"), sep=" < "), collapse=" & ")
data[eval(parse(text=str1)),]

data

set.seed(153)
data <- as.data.frame(matrix(sample(0:8, 5*20, replace=TRUE), ncol=5))
colnames(data) <- paste('V', 1:ncol(data), sep="_")
bound <- sample(1:15, 5, replace=TRUE)

In case you have "bound_1", "bound_2", etc instead of a "vector"

bound_1 <- 6
bound_2 <- 8
bound_3 <- 7
bound_4 <- 7
bound_5 <- 14

Upvotes: 1

Related Questions