Bill Barrington
Bill Barrington

Reputation: 1

Drop columns in which a given number of values do not reach threshold

I have a large data set where I would like to remove each column for which less than 10% of the of rows have values are greater than 1. Please assist, thank you!

         X0610005C13Rik X0610007N19Rik X0610007P14Rik X0610009B22Rik 
1013      0.9212730       5.098840       59.62392        55.9218       
1014      0.2102610       1.507530       69.87635        48.7867       
1024      0.9948520       1.168450       76.46345        65.7150   
...    

Upvotes: 0

Views: 159

Answers (1)

Remko Duursma
Remko Duursma

Reputation: 2821

Here is a solution with sapply.

# some example data
set.seed(1)
dat <- as.data.frame(matrix(runif(200, 0.2, 1.1), ncol=5))

# calculate proportion of data larger than 1
prop_large <- sapply(dat, function(x)length(x[x > 1])/length(x))

# use it to index the dataframe
dat <- dat[,prop_large > 0.1]

Upvotes: 2

Related Questions