How to incorporate features from a latent semantic analysis as independent variables in a predictive model

Question

I am trying to run logistic regression using text data in R. I have built a term document matrix and a corresponding latent semantic space. In my understanding, LSA is used in deriving 'concepts' out of 'terms' which could help in dimension reduction. Here's my code:

tdm = TermDocumentMatrix(corpus, control = list(tokenize=myngramtoken,weighting=myweight))
tdm = removeSparseTerms(tdm,0.98)
tdm = as.matrix(tdm)
tdm.lsa = lsa(tdm,dimcalc_share())
tdm.lsa_tk=as.data.frame(tdm.lsa$tk)
tdm.lsa_dk=as.data.frame(tdm.lsa$dk)
tdm.lsa_sk=as.data.frame(tdm.lsa$sk)

This gives features as V1, V2, V3.... V21. Is it possible to use these as the independent variables in my logistic regression? If so, how can I do it?

How to incorporate features from a latent semantic analysis as independent variables in a predictive model

Answers (1)

Related Questions