How to calculate a proportion of columns meeting a threshold in R?

Question

I have data in R in a numeric class in the form:

Input_SNP Set_1 Set_2 Set_3 Set_4 Set_5 Set_5 10.67 7.91 6.98 7.93 7.70 11.15 8.58

I actually have 500 sets. I would like to calculate the proportion of Sets that have a value greater than or equal to my Input_SNP column. For example, this has 1 value (11.15) greater than or equal to 10.67. So I would like 1/(number of sets). I'm sure this is simple, how can it be done?

user5363218 · Accepted Answer

data = read.table(header = T,  text  = "Input_SNP     Set_1     Set_2     Set_3     Set_4     Set_5      Set_5
10.67          7.91      6.98      7.93      7.70      11.15      8.58")

# Compare all the values (except the first) to the first
data[,-1] > data$Input_SNP
# Set_1 Set_2 Set_3 Set_4 Set_5 Set_5.1
# [1,] FALSE FALSE FALSE FALSE  TRUE   FALSE


# Get the length of "true" index 
length(which(data[,-1] > data$Input_SNP)) / (ncol(data) - 1)
# 0.1666667

If you don't want to use dataframes, he following uses a matrix:

data = read.table(header = T,  text  = "Input_SNP     Set_1     Set_2     Set_3         Set_4     Set_5      Set_5
10.67          7.91      6.98      7.93      7.70      11.15      8.58")

# Generate some further random data to verify correct row indexing 
data = rbind(data, runif(n = ncol(data), min = 5, max = 15))
data = as.matrix(data)

# Input_SNP    Set_1    Set_2    Set_3    Set_4     Set_5 Set_5.1
# 1 10.670000 7.910000  6.98000  7.93000 7.700000 11.150000  8.5800
# 2  6.670087 5.308156 12.81796 13.40233 7.753867  5.049444 14.5793



logicalResults = apply(X = data, MARGIN = 1, FUN = function(x){x[1] <= x[-1]})
logicalResults = t(logicalResults)

#   Set_1 Set_2 Set_3 Set_4 Set_5 Set_5.1
# 1 FALSE FALSE FALSE FALSE  TRUE   FALSE
# 2 FALSE  TRUE  TRUE  TRUE FALSE    TRUE


apply(X = logicalResults, MARGIN = 1, FUN = function(x){length(which(x[-1] == T))}) / ncol(logicalResults)
# 1         2 
# 0.1666667 0.6666667

How to calculate a proportion of columns meeting a threshold in R?

Answers (2)

Related Questions