dank
dank

Reputation: 333

From vector of strings to model matrix in R

I have a vector of 16163 strings that looks like this:

sentencevector <- c('decided clean debt get finances together Thank consideration',
'I stable job I will never get laid I fixed',
'Using pay existing loans credit card debt All higher',
'Substantially lower giving peace mind My job stable'...)

The sentences have random words and a random length.

From that vector, I want to get a dummy variable matrix. Each column represents a word. Shows 1 if the word is in the sentence and 0 if not.

The first row of the matrix would look like this:

Data <- data.frame(
X = c('decided clean debt get finances together thank consideration'...),
decided = 1,
lean = 1,
dance = 0,
debt=1 ,...)

I did a list of unique words in the sentence vector called universe and tried to create a df with the following code:

df <-setNames(data.frame(matrix(ncol = length(universe), nrow = length(sentencevector)), universe)

Then I tried to populate the matrix with a nested loop but it takes too long.

Upvotes: 0

Views: 221

Answers (1)

napsta32
napsta32

Reputation: 31

Use DocumentTermMatrix or TermDocumentMatrix: https://www.rdocumentation.org/packages/tm/versions/0.6-2/topics/TermDocumentMatrix

You must assume each sentence is a document. Try by sending the whole data frame of sentences to this function. After that you can use your own filters to extract the data that you are searching. For example, something like if val>0 then 1 else 0.

Here you have a tutorial, a bit complicated: https://rpubs.com/MajstorMaestro/256588

Upvotes: 0

Related Questions