paulusm
paulusm

Reputation: 806

how to equalise the columns of two sparse matrices

I've got two sparse matrices, for a training and test set, and I need to remove columns in each that are not in the other - making the columns the same in both. At the moment I'm doing so with a loop, but I'm sure there is a more efficient way to do it:

# take out features in training set that are not in test
  i<-0
  for(feature in testmatrix@Dimnames[2][[1]]){
    i<-i+1
    if(!(feature %in% trainmatrix@Dimnames[2][[1]])){
      removerows<-c(removerows, i)
    }
  }
  testmatrix<-testmatrix[,-removerows]

# and vice versa...

Upvotes: 0

Views: 97

Answers (1)

Simon O&#39;Hanlon
Simon O&#39;Hanlon

Reputation: 59980

To me it looks like all you want to do is keep the columns of testmatrix that also appear in trainmatrix and vice versa. Since you want apply this to both matrices, a quick way would be to use intersect on the vectors of colnames from each matrix to find intersecting colnames and then use this to subset:

#  keep will be a vector of colnames that appear in BOTH train and test matrices
keep <- intersect( colnames(test) , colnames(train) )

#  Then subset on this vector
testmatrix <- testmatrix[ , keep ]
trainmatrix <- trainmatrix[ , keep ]

Upvotes: 2

Related Questions