Reputation: 806
I've got two sparse matrices, for a training and test set, and I need to remove columns in each that are not in the other - making the columns the same in both. At the moment I'm doing so with a loop, but I'm sure there is a more efficient way to do it:
# take out features in training set that are not in test
i<-0
for(feature in testmatrix@Dimnames[2][[1]]){
i<-i+1
if(!(feature %in% trainmatrix@Dimnames[2][[1]])){
removerows<-c(removerows, i)
}
}
testmatrix<-testmatrix[,-removerows]
# and vice versa...
Upvotes: 0
Views: 97
Reputation: 59980
To me it looks like all you want to do is keep the columns of testmatrix
that also appear in trainmatrix
and vice versa. Since you want apply this to both matrices, a quick way would be to use intersect
on the vectors of colnames
from each matrix to find intersecting colnames
and then use this to subset:
# keep will be a vector of colnames that appear in BOTH train and test matrices
keep <- intersect( colnames(test) , colnames(train) )
# Then subset on this vector
testmatrix <- testmatrix[ , keep ]
trainmatrix <- trainmatrix[ , keep ]
Upvotes: 2