Reputation: 11
I would like to create a TDM from a text with specific sentences (two or more words combined) instead of single words. The sentences could be for example "climate change"
, "global worming"
, "lad use"
, etc. The examples I have seen are all with single words.
tabela = DocumentTermMatrix(textolimpo,
list(dictionary = c("climate change","global worming","land use")))
I appreciate if someone could help me.
Cheers.
Rafael
Upvotes: 1
Views: 94
Reputation: 54237
I recommend quanteda
:
library(quanteda)
textolimpo <- c("This climate change concerns me. This climate changes", "Wormed: global worming increased")
(dfm <- dfm(textolimpo,
ngrams=2L,
dictionary = list(climate="climate_change",
warm="global_worming"),
valuetype = "regex"))
# 2 x 2 sparse Matrix of class "dfmSparse"
# features
# docs climate warm
# text1 2 0
# text2 0 1
(dfm <- dfm(textolimpo,
ngrams=2L,
thesaurus = list(climate="climate_change",
warm="global_worming"),
valuetype = "regex"))
# 2 x 8 sparse Matrix of class "dfmSparse"
# this_climate change_concerns concerns_me me_this wormed_global worming_increased CLIMATE WARM
# text1 2 1 1 1 0 0 2 0
# text2 0 0 0 0 1 1 0 1
Upvotes: 2