Reputation: 1499
I have data in R in a numeric class in the form:
Input_SNP Set_1 Set_2 Set_3 Set_4 Set_5 Set_5
10.67 7.91 6.98 7.93 7.70 11.15 8.58
I actually have 500 sets. I would like to calculate the proportion of Sets that have a value greater than or equal to my Input_SNP column. For example, this has 1 value (11.15) greater than or equal to 10.67. So I would like 1/(number of sets). I'm sure this is simple, how can it be done?
Upvotes: 0
Views: 830
Reputation:
data = read.table(header = T, text = "Input_SNP Set_1 Set_2 Set_3 Set_4 Set_5 Set_5
10.67 7.91 6.98 7.93 7.70 11.15 8.58")
# Compare all the values (except the first) to the first
data[,-1] > data$Input_SNP
# Set_1 Set_2 Set_3 Set_4 Set_5 Set_5.1
# [1,] FALSE FALSE FALSE FALSE TRUE FALSE
# Get the length of "true" index
length(which(data[,-1] > data$Input_SNP)) / (ncol(data) - 1)
# 0.1666667
If you don't want to use dataframes, he following uses a matrix:
data = read.table(header = T, text = "Input_SNP Set_1 Set_2 Set_3 Set_4 Set_5 Set_5
10.67 7.91 6.98 7.93 7.70 11.15 8.58")
# Generate some further random data to verify correct row indexing
data = rbind(data, runif(n = ncol(data), min = 5, max = 15))
data = as.matrix(data)
# Input_SNP Set_1 Set_2 Set_3 Set_4 Set_5 Set_5.1
# 1 10.670000 7.910000 6.98000 7.93000 7.700000 11.150000 8.5800
# 2 6.670087 5.308156 12.81796 13.40233 7.753867 5.049444 14.5793
logicalResults = apply(X = data, MARGIN = 1, FUN = function(x){x[1] <= x[-1]})
logicalResults = t(logicalResults)
# Set_1 Set_2 Set_3 Set_4 Set_5 Set_5.1
# 1 FALSE FALSE FALSE FALSE TRUE FALSE
# 2 FALSE TRUE TRUE TRUE FALSE TRUE
apply(X = logicalResults, MARGIN = 1, FUN = function(x){length(which(x[-1] == T))}) / ncol(logicalResults)
# 1 2
# 0.1666667 0.6666667
Upvotes: 1
Reputation: 28461
Whether it is a data frame of matrix, you can try:
rowMeans(df[,-1] > df[,1], na.rm=TRUE)
#[1] 0.1666667
Or if we extend the data using your last question it still works:
rowMeans(df[,-1] > df[,1], na.rm=TRUE)
#[1] 0.4000000 1.0000000 NaN 0.0000000 0.2000000 0.2000000 0.1666667
And also to make sure it works for matrices:
mat <- as.matrix(df)
rowMeans(mat[,-1] > mat[,1], na.rm=TRUE)
#[1] 0.4000000 1.0000000 NaN 0.0000000 0.2000000 0.2000000 0.1666667
extended data
df <- read.table(text="Input_SNP Set_1 Set_2 Set_3 Set_4 Set_5 Set_6
1.09 0.162 NA 2.312 1.876 0.12 0.812
0.687 NA 0.987 1.32 1.11 1.04 NA
NA 1.890 0.923 1.43 0.900 2.02 2.7
2.801 0.642 0.791 0.812 NA 0.31 1.60
1.33 1.33 NA 1.22 0.23 0.18 1.77
2.91 1.00 1.651 NA 1.55 3.20 0.99
2.00 2.31 0.89 1.13 1.25 0.12 1.55", header=T)
Update
If you are comparing the data frame to a numeric vector, you will not need the dimensions of the second as it does not have dimensions:
rowMeans(df[-1] > my_vector, na.rm=T)
Upvotes: 1