Subset rows based on a specific threshold value

Question

I want to get a subset of the columns observations of my data frame, based on a threshold. I'll explain you the question with a little more details.

I have a data frame with the methylation level of 35 patients afected by lung adenocarcinoma. This is a subset of my data:

> df.met[1:5,1:5]
                A2BP1       A2M     A2ML1     A4GALT       AAAS
paciente6  0.36184475 0.4555788 0.6422624 0.08051388 0.15013343
paciente7  0.47566878 0.7329827 0.4938048 0.45487573 0.10827520
paciente8  0.17455497 0.7528387 0.5686839 0.37018038 0.12423923
paciente9  0.04830471 0.5166676 0.8878207 0.08881092 0.11779075
paciente10 0.16757806 0.7896194 0.5408747 0.35315243 0.09234602

Now, I need to get another object (with the same number of columns, but smaller number of rows, and diferent in every column) with a subset of the values greater than 0.1 for all the columns of the initial data frame.

My intention is to obtain a object like this one (I don't know if is possible...):

            A2BP1       A2M     A2ML1     A4GALT       AAAS
paciente6  0.36184475 0.4555788 0.6422624            0.15013343
paciente7  0.47566878 0.7329827 0.4938048 0.45487573 0.10827520
paciente8  0.17455497 0.7528387 0.5686839 0.37018038 0.12423923
paciente9             0.5166676 0.8878207            0.11779075
paciente10 0.16757806 0.7896194 0.5408747 0.35315243

In other words, I want to avoid of my data frame, the values smaller than 0.1.

Thank you so much!

akrun · Accepted Answer

You may need

df.met[!rowSums(df.met <= 0.1),,drop=FALSE]
#           A2BP1       A2M     A2ML1    A4GALT      AAAS
#paciente7 0.4756688 0.7329827 0.4938048 0.4548757 0.1082752
#paciente8 0.1745550 0.7528387 0.5686839 0.3701804 0.1242392

Update

Based on the edit

is.na(df.met) <- df.met <= 0.1
df.met
#              A2BP1       A2M     A2ML1    A4GALT      AAAS
#paciente6  0.3618447 0.4555788 0.6422624        NA 0.1501334
#paciente7  0.4756688 0.7329827 0.4938048 0.4548757 0.1082752
#paciente8  0.1745550 0.7528387 0.5686839 0.3701804 0.1242392
#paciente9         NA 0.5166676 0.8878207        NA 0.1177907
#paciente10 0.1675781 0.7896194 0.5408747 0.3531524        NA

Using data.table

library(data.table)#v1.9.5+
setDT(df.met, keep.rownames=TRUE)[]

for(j in 2:ncol(df.met)){
   set(df.met, i=which(df.met[[j]] <=0.1), j=j, value=NA)
 }

 df.met
 #          rn     A2BP1       A2M     A2ML1    A4GALT      AAAS
 #1:  paciente6 0.3618447 0.4555788 0.6422624        NA 0.1501334
 #2:  paciente7 0.4756688 0.7329827 0.4938048 0.4548757 0.1082752
 #3:  paciente8 0.1745550 0.7528387 0.5686839 0.3701804 0.1242392
 #4:  paciente9        NA 0.5166676 0.8878207        NA 0.1177907
 #5: paciente10 0.1675781 0.7896194 0.5408747 0.3531524        NA

data

df.met <- structure(list(A2BP1 = c(0.36184475, 0.47566878, 0.17455497, 
0.04830471, 0.16757806), A2M = c(0.4555788, 0.7329827, 0.7528387, 
0.5166676, 0.7896194), A2ML1 = c(0.6422624, 0.4938048, 0.5686839, 
0.8878207, 0.5408747), A4GALT = c(0.08051388, 0.45487573, 0.37018038, 
0.08881092, 0.35315243), AAAS = c(0.15013343, 0.1082752, 0.12423923, 
0.11779075, 0.09234602)), .Names = c("A2BP1", "A2M", "A2ML1", 
"A4GALT", "AAAS"), class = "data.frame", row.names = c("paciente6", 
"paciente7", "paciente8", "paciente9", "paciente10"))

Subset rows based on a specific threshold value

Answers (2)

Update

data

Related Questions