Diego de Lima
Diego de Lima

Reputation: 471

What's making the texts lowercase in this Corpora, and how can I turn it uppercase?

I'm trying to build a word cloud in R, but it's returning only lowercase texts.

sheet <- read_excel('list_products.xls', skip = 4)
products <- c(sheet$Cod)
products <- Corpus(VectorSource(products))
c_words <- brewer.pal(8, 'Set2')
wordcloud(products, min.freq = 10, max.words = 30, scale = c(7,1), colors = c_words)

Once I've tried putting the following code before the wordcloud function, and it's not working:

products <- tm_map(products, content_transformer(toupper))

What's making the texts lowercase, and what should I do to turn it o uppercase?

Upvotes: 1

Views: 248

Answers (1)

LocoGris
LocoGris

Reputation: 4480

Well, as you can see from here: Make all words uppercase in Wordcloud in R, when you do TermDocumentMatrix(CORPUS), by default words get lowercase. Indeed, if you do trace(wordcloud) when there is not argument freq, tdm <- tm::TermDocumentMatrix(corpus) is executed, so you words go lowercase.

You have two options to solve this: Include words and freq instead of corpus:

filePath <- "http://www.sthda.com/sthda/RDoc/example-files/martin-luther-king-i-have-a-dream-speech.txt" # I am using this text because you DID NOT PROVIDED A REPRODUCIBLE EXAMPLE
text <- readLines(filePath)
products <- Corpus(VectorSource(text))
products <- tm_map(products, toupper)
c_words <- brewer.pal(8, 'Set2')
tdm <- tm::TermDocumentMatrix(products, control = list(tolower = F))
freq_corpus <- slam::row_sums(tdm)
wordcloud(names(freq_corpus), freq_corpus, min.freq = 10, max.words = 30, scale = c(7,1), colors = c_words)

And you will get:

enter image description here

The second option is to modify wordcloud:

First you do trace(worcloud, edit=T) and then substitute line 21 by:

tdm <- tm::TermDocumentMatrix(corpus, control = list(tolower = F))

Click save and execute:

filePath <- "http://www.sthda.com/sthda/RDoc/example-files/martin-luther-king-i-have-a-dream-speech.txt"
text <- readLines(filePath)
products <- Corpus(VectorSource(text))
products <- tm_map(products, toupper)
c_words <- brewer.pal(8, 'Set2')
wordcloud(names(freq_corpus), freq_corpus, min.freq = 10, max.words = 30, scale = c(7,1), colors = c_words)

You will get something like:

enter image description here

Upvotes: 1

Related Questions