Reputation: 1
I have a large data set where I would like to remove each column for which less than 10% of the of rows have values are greater than 1. Please assist, thank you!
X0610005C13Rik X0610007N19Rik X0610007P14Rik X0610009B22Rik
1013 0.9212730 5.098840 59.62392 55.9218
1014 0.2102610 1.507530 69.87635 48.7867
1024 0.9948520 1.168450 76.46345 65.7150
...
Upvotes: 0
Views: 159
Reputation: 2821
Here is a solution with sapply.
# some example data
set.seed(1)
dat <- as.data.frame(matrix(runif(200, 0.2, 1.1), ncol=5))
# calculate proportion of data larger than 1
prop_large <- sapply(dat, function(x)length(x[x > 1])/length(x))
# use it to index the dataframe
dat <- dat[,prop_large > 0.1]
Upvotes: 2