Reputation: 2767
I have a data frame with 800 columns. I want to select rows from a data frame using a condition from every column. How can I do that without huge long which
like
data[which(data$V_1 < bound_1 & ...& data$V_n<bound_n),]
This is a fragment of my data frame
type_Browser os_name_Windows XP ua_family_Chrome ua_name_Chrome0
[1,] 0.06453172 0.09318651 0.09849316 0.1962756
[2,] 0.06453172 0.09318651 0.09849316 0.1962756
[3,] 0.06453172 0.09318651 0.00000000 0.0000000
[4,] 0.06453172 0.00000000 0.00000000 0.0000000
[5,] 0.06453172 0.00000000 0.09849316 0.1962756
[6,] 0.06453172 0.09318651 0.00000000 0.0000000
[7,] 0.06453172 0.00000000 0.00000000 0.0000000
[8,] 0.06453172 0.09318651 0.00000000 0.0000000
[9,] 0.06453172 0.00000000 0.09849316 0.1962756
[10,] 0.06453172 0.09318651 0.00000000 0.0000000
This is a fragment of centers of clusters after kmeans
type_Browser os_name_Windows XP ua_family_Chrome ua_name_Chrome 0
1 0.9973870 0.9014791 0.8885468 0.9162910
2 0.1370203 0.9323763 0.3940263 0.8250081
3 0.7121533 0.9541988 0.1418068 0.6568214
4 0.9998909 0.9881944 0.9959341 0.3181853
5 0.9278844 0.9796447 0.9247542 0.9510941
6 0.9784205 0.8586415 0.8902691 0.8210114
7 0.7115432 0.9930360 0.9652756 0.9735471
8 0.9907865 0.9896360 0.9910279 0.9781258
9 0.9967735 0.9919486 0.9921240 0.9702438
10 0.9998825 0.9940538 0.9970676 0.9839453
Then I make two bounds
lowerBound = centers - eps;
upperBound = centers + eps;
Then I want to select rows which lies in [ centers - eps, centers + eps ].
for(i in 1:k){
ithLB = lowerBound[i,];
ithUB = upperBound[i,];
ithKernel <- data[ which(data[,1]<=lowerBound[1] & ...& which(data[,812]<=lowerBound[812],] # I want to change this expression for something more reasonable.
}
Upvotes: 2
Views: 274
Reputation: 886938
You could try
data[Reduce(`&`,Map('<', data, bound)),]
Suppose there is "bound_1", "bound_2", ..."bound_N" objects
bound <- mget(paste('bound', 1:ncol(data), sep="_"))
and use the same code as above
Another less optimal option would be using paste
with eval(parse
(not recommended)
str1 <- paste(paste(paste0('data$',paste('V', 1:ncol(data), sep="_")),
paste('bound', 1:ncol(data), sep="_"), sep=" < "), collapse=" & ")
data[eval(parse(text=str1)),]
set.seed(153)
data <- as.data.frame(matrix(sample(0:8, 5*20, replace=TRUE), ncol=5))
colnames(data) <- paste('V', 1:ncol(data), sep="_")
bound <- sample(1:15, 5, replace=TRUE)
In case you have "bound_1", "bound_2", etc instead of a "vector"
bound_1 <- 6
bound_2 <- 8
bound_3 <- 7
bound_4 <- 7
bound_5 <- 14
Upvotes: 1