Reputation: 21
I'm sure the answer to this question is out there already, but I can't find it, since I'm a beginner at R and don't know what search terms to use.
I want to retrieve the rows in a data frame where a given proportion of the columns meet a criteria. For example, 2/3 columns >1.3.
Here is what I have so far:
a<-c(1.1,1.2,1.3,1.4,1.5)
b<-c(1.3,1.4,1.5,1.6,1.7)
c<-c(1.5,1.6,1.7,1.8,1.9)
data<-data.frame(a,b,c)
data`
a b c
1 1.1 1.3 1.5
2 1.2 1.4 1.6
3 1.3 1.5 1.7
4 1.4 1.6 1.8
5 1.5 1.7 1.9
c<-function(x) (length(x[(x>1.4)]))>=(2/3*ncol(data))
d<-apply(data,1,c)
result<-data[d,]
result
a b c
3 1.3 1.5 1.7
4 1.4 1.6 1.8
5 1.5 1.7 1.9
This works, but I feel like there must be a simpler way, or that the function could be written differently? I'm still trying to properly undestand this whole function-thing.
Of course, in reality my dataframe would have alot of columns.
/Grateful beginner
Upvotes: 2
Views: 148
Reputation: 1196
Just to give another alternative to David's answer. You can use the mean
function on a vector of logical values in R to return the percentage of TRUE
values in the vector.
Create the data
a<-c(1.1, 1.2, 1.3, 1.4, 1.5)
b<-c(1.3, 1.4, 1.5, 1.6, 1.7)
c<-c(1.5, 1.6, 1.7, 1.8, 1.9)
data<-data.frame(a, b, c)
A function to return a logical vector indicating if the values are above the threshold
gt <- function(x, threshold){
tmp <- x > threshold
return(tmp)
}
An example using the first row of the data.frame
gt(data[1,], 1.4)
If you take the sum of the logical vector it returns the number of TRUE
instances:
sum(gt(data[1,], 1.4))
# [1] 1
and if you use the mean
function it returns the percentage of positive instances:
mean(gt(data[1,], 1.4))
# [1] 0.3333333
Using that you can use David's approach:
index <- apply(data,1, function(x) sum(gt(x, 1.4)) >= 2/3 * length(x))
or you can use the percentage via the mean
function.
index <- apply(data,1, function(x) mean(gt(x, 1.4)) > 0.6)
Upvotes: 0
Reputation: 92300
Maybe (Should be more efficient as rowSums
is vectorized and saves the need in using apply
loop)
data[rowSums(data > 1.4) >= 2/3*ncol(data),]
## a b c
## 3 1.3 1.5 1.7
## 4 1.4 1.6 1.8
## 5 1.5 1.7 1.9
Or if you prefer a function, could try
myfunc <- function(x) x[rowSums(x > 1.4) >= 2/3*ncol(x), ]
myfunc(data)
## a b c
## 3 1.3 1.5 1.7
## 4 1.4 1.6 1.8
## 5 1.5 1.7 1.9
Upvotes: 1