Reputation: 15
I am working on my word cloud using R. In my data, I have many "AI" but the word cloud does not recognize this word. ChatGPT respond that I have 43 AI in my data. Here is my code:
install.packages("wordcloud")
install.packages("tm")
install.packages("RColorBrewer")
library("wordcloud")
library("tm")
library(ggplot2)
library(readxl)
library(RColorBrewer)
#####Good Source: https://www.youtube.com/watch?v=oVVvG035vQc
##################################
# Read the data from the Excel file
data <- read_excel("/Users/home/Dropbox/ATD/ATD24 In-Person & Virtual for Delegations.xlsx", sheet = "1. Schedule")
# Extract the session title column
# Replace 'session' with the exact name of your column
titles <- data$`Session Title (Session)`
# Create a text corpus
corpus <- Corpus(VectorSource(titles))
# Define custom stopwords
my_stopwords <- c("learning", "development", "inperson", "streamed", "live", "design", "training", "new", "better", "can", "beyond", "like", "know", "without", "almost", "don't")
# Define a color palette
colors <- brewer.pal(8, "Dark2") # Choose a palette from RColorBrewer
# To get more colors you can repeat the palette
colors <- colorRampPalette(colors)(100) # Adjust the number to control color variations
# Preprocess the text data
corpus <- tm_map(corpus, content_transformer(tolower))
corpus <- tm_map(corpus, removePunctuation)
corpus <- tm_map(corpus, removeNumbers)
corpus <- tm_map(corpus, removeWords, c(stopwords("english"), my_stopwords))
corpus[[83]][1]
##create table
tdm <- TermDocumentMatrix(corpus)
m<- as.matrix(tdm)
v<-sort(rowSums(m),decreasing = TRUE)
d<-data.frame(word=names(v),freq=v)
# View(d)
# Generate the word cloud
wordcloud(corpus, min.freq=1,max.words = 500, random.order = FALSE, rot.per = 0.35, scale = c(2, 0.1), use.r.layout=FALSE,colors=brewer.pal(8, "Set2"))
For example, I have "Harness AI to Transform Your Learning Ecosystem" in 83th row, I got d that I have ecosystem, harness, learning, transform, and your. SO I lost AI only (I think to is removed due to my stopwords).
tdm <- TermDocumentMatrix(corpus[[83]])
m<- as.matrix(tdm)
v<-sort(rowSums(m),decreasing = TRUE)
d<-data.frame(word=names(v),freq=v)
Thank you for your help in advance.
R in word cloud remove the specific words
Upvotes: 1
Views: 55