How to match tokens in document term matrix to a separate data frame (of POS codes)

Question

Basically I have my bag of words:

source <- VectorSource(text)
corpus <- Corpus(source)
corpus <- tm_map(corpus, content_transformer(tolower))
dtm <- DocumentTermMatrix(cleanset)

etc etc.

And I have a data frame consisting or just two columns which I called up from a SQLite DB. Column 1 is a list of hundreds of words, and Column 2 is each word's corresponding Part of Speech code.

I am trying to match every token in my dtm to the identical term in column 1 of the dataframe, so that each token then can be matched its corresponding POS code. Essentially, the dataframe is like a dictionary, and I want to match each token in my dtm to its definition.

I tried a bunch of GREP functions to do this, but to no avail. Anyone have thoughts on the best way to approach this?

Thanks!

How to match tokens in document term matrix to a separate data frame (of POS codes)

Answers (1)

Related Questions