Reputation: 51
I have a data frame "data" with 60 rows (=samples) and 20228 columns where the first column is my target variable (an ordered factor : 0 or 1) and the other columns are my features (=numeric). I want to do a feature selection with mRMRe in a loop corresponding to a 5-cross-validation that I do 3 times. I select every time 25 features. Here is the problematic part of my code :
library(caret)
library(mRMRe)
data <- read.csv("home/RNA_seq.csv", row.names=1, sep=";", stringsAsFactors=FALSE)
data <- data.frame(t(data))
data[,1] <- factor(data[,1])
data[,1] <- ordered(data[,1], levels = c("0", "1"))
features_select <- list()
r <- 5 # 5-cross-validation
t <- 3 # 5-cross-validation done 3 times
for (j in 1:t){
for (i in 1:r){
#5-cross-validation
train.index <- createFolds(factor(data$Response), k = 5, list = TRUE, returnTrain = TRUE)
datatrain <- data[train.index[[i]],]
datatest <- data[-train.index[[i]],]
#Feature selection
data.mrmre.train <- mRMR.data(data=datatrain)
res.fs.mrmr <- mRMR.classic(data=data.mrmre.train, target_indices=1, feature_count=25)
selected.features.mrmre <- mRMRe::solutions(res.fs.mrmr)
features_select[[((j-1)*r+i)]] <- res.fs.mrmr@feature_names[unlist(selected.features.mrmre)]
print(features_select[[((j-1)*r+i)]])
print(res.fs.mrmr)
}
}
My problem is that sometimes my target variable called "Response"(=column 1 of "data") is selected by mRMRe. By example :
features_select :
[[1]]
[1] "AC137800.2" "AC007387.1" "AC079354.1" "AC145138.1" "RNA5SP370"
[6] "RNA5SP219" "AL022324.1" "AC023449.1" "AP000873.1" "AC020612.2"
[11] "RNA5SP473" "AC092810.1" "IGKV1D.37" "SST" "AC093331.1"
[16] "TRAJ34" "AC107983.1" "RPL39P" "HSBP1P1" "TRBJ1.6"
[21] "PHGR1" "RNA5SP435" "RNA5SP301" "AC005255.1" "KRT127P"
[[2]]
[1] "AC073869.8" "Response" "Response" "Response" "Response" "Response"
[7] "Response" "Response" "Response" "Response" "Response" "Response"
[13] "Response" "Response" "Response" "Response" "Response" "Response"
[19] "Response" "Response" "Response" "Response" "Response" "Response"
[25] "Response"
Here is the output of the function mRMR.classic() in the first case and in the second case (=bad case) :
[[1]]
Formal class 'mRMRe.Filter' [package "mRMRe"] with 8 slots
..@ filters :List of 1
.. ..$ 1: int [1:25, 1] 18837 18781 15503 15526 17437 20028 18924 17133 17024 16104 ...
..@ scores :List of 1
.. ..$ 1: num [1:25, 1] 0.817 0.819 0.817 0.817 0.817 ...
..@ mi_matrix : num [1:20228, 1:20228] NA -0.3786 -0.1536 -0.0929 -0.0964 ...
.. ..- attr(*, "dimnames")=List of 2
.. .. ..$ : chr [1:20228] "Response" "TMSB15B" "MATR3" "HSPA14" ...
.. .. ..$ : chr [1:20228] "Response" "TMSB15B" "MATR3" "HSPA14" ...
..@ causality_list:List of 1
.. ..$ 1: num [1:20228] NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ...
..@ sample_names : chr [1:48] "Pt1_28" "Pt2_28" "Pt4_28" "Pt5_28" ...
..@ feature_names : chr [1:20228] "Response" "TMSB15B" "MATR3" "HSPA14" ...
..@ target_indices: int 1
..@ levels : int [1:25] 1 1 1 1 1 1 1 1 1 1 ...
[[2]]
Formal class 'mRMRe.Filter' [package "mRMRe"] with 8 slots
..@ filters :List of 1
.. ..$ 1: int [1:25, 1] 1 1 1 1 1 1 1 1 1 1 ...
..@ scores :List of 1
.. ..$ 1: num [1:25, 1] 0 0 0 0 0 0 0 0 0 0 ...
..@ mi_matrix : num [1:20228, 1:20228] NA -0.518 -0.246 -0.211 -0.204 ...
.. ..- attr(*, "dimnames")=List of 2
.. .. ..$ : chr [1:20228] "Response" "TMSB15B" "MATR3" "HSPA14" ...
.. .. ..$ : chr [1:20228] "Response" "TMSB15B" "MATR3" "HSPA14" ...
..@ causality_list:List of 1
.. ..$ 1: num [1:20228] NA NaN NaN NaN NaN NaN NaN NaN NaN NaN ...
..@ sample_names : chr [1:48] "Pt1_28" "Pt2_28" "Pt4_28" "Pt5_28" ...
..@ feature_names : chr [1:20228] "Response" "TMSB15B" "MATR3" "HSPA14" ...
..@ target_indices: int 1
..@ levels : int [1:25] 1 1 1 1 1 1 1 1 1 1 ...
This doesn't appear every time for the same value of i and j into the loop. Do you have an idea where is the problem ?
Thank you in advance !
Upvotes: 3
Views: 893
Reputation: 51
I got a response from the authors of the mRMRe package. The solution is to use the "strata" parameter to indicate my target variable (= ordered factor) in the mRMR.data()
function. So, I had to change:
data.mrmre.train <- mRMR.data(data=datatrain)
to:
data.mrmre.train <- mRMR.data(data=datatrain[,-1], strata=datatrain[,1])
.
For more details, see: https://github.com/bhklab/mRMRe/issues/27
Upvotes: 2