Reputation: 58
I am doing some twitter analysis in R, right now im getting the tweets from an account, but i want to search for 1 or more specific words inside those tweets and then do a plot that shows how much has the word been repeated over a period of time, for example 1 week.
this is my script:
#instalar librerias
library(wordcloud)
library(SnowballC)
library(rtweet)
library(tm)
library(RColorBrewer)
library(tidytext)
library(dplyr)
library(wordcloud2)
library(stringr)
library(qdapRegex)
# Identificacion y obtencion de tokens
appname <- "AnalisisTwitSent"
consumer_key <- "CvgpjfxMIyUmg21HFPSKoFKr4"
consumer_secret <- "5VO0fWH6QK5jyYWx4PtABHyhvvZ5JyVjDNjQ2F36mDjYibu5g7"
access_token <- "2820319925-CTKOd9yiA8MmJlak1iXUDCbg2MKkKDlffjr9LyV"
access_secret <- "ZiZBJIjxqY9lNLemYdGxMD6BYM6eY43NyLGhRS4NRKu5S"
twitter_token <- create_token(app = appname,
consumer_key = consumer_key,
consumer_secret = consumer_secret,
access_token = access_token,
access_secret = access_secret,
set_renv = TRUE)
ver_palabras_comunes_nube <- function(busqueda, cantidad) {
#Obtener tweets
#tweets <- get_timeline(usuario, n = cantidad,
#parse = TRUE, check = TRUE,
#include_rts = TRUE)
tweets <- search_tweets(busqueda, cantidad, include_rts = FALSE)
text <- str_c(tweets$text, collapse = "")
# continue cleaning the text
text <-
text %>%
str_remove("\\n") %>% # remove linebreaks
rm_twitter_url() %>% # Remove URLS
rm_url() %>%
str_remove_all("#\\S+") %>% # Remove any hashtags
str_remove_all("@\\S+") %>% # Remove any @ mentions
removeWords(stopwords("spanish")) %>% # Remove common words (a, the, it etc.)
removeNumbers() %>%
stripWhitespace() %>%
removeWords(c("amp")) # Final cleanup of other small changes
gsub("\\p{So}|\\p{Cn}", "", text, perl = TRUE)
rm_emoticon(text, replacement = "")
# Convert the data into a summary table
textCorpus <-
Corpus(VectorSource(text)) %>%
TermDocumentMatrix() %>%
as.matrix()
textCorpus <- sort(rowSums(textCorpus), decreasing=TRUE)
textCorpus <- data.frame(word = names(textCorpus), freq=textCorpus, row.names = NULL)
wordcloud <- wordcloud2(data = textCorpus, minRotation = 0, maxRotation = 0)
wordcloud
}
Upvotes: 4
Views: 349
Reputation: 12420
To get to a frequency plot over time for specific words, you only really have to count how often they appear in each time slot and then plot them. I'm using the tidytext
package here which works really nicely for this. But you could also think about just using stringr::str_count()
(watch out or correct tokenisation in this case though). You put your code in a function, which isn't really necessary in this case but I wrote the code so you can quickly put it back into a function if you like.
library(rtweet)
library(tidyverse)
library(tidytext)
# define variables
busqueda <- "Poppycock"
cantidad <- 100
pattern <- c("is", "to")
# query tweets
tweets <- search_tweets(busqueda, cantidad, include_rts = FALSE)
# count the occurence of the pattern words
pattern_df <- tweets %>%
select(status_id, text, created_at) %>% # only keep data columns we need later
unnest_tokens(word, text) %>% # split the text into tokens (words)
filter(word %in% pattern) %>% # only keept words defined in pattern
mutate(hour = lubridate::hour(created_at)) %>% # extract the hour from the created_at time, use week here if you want
count(word, hour) # count the words per hour
# plot
ggplot(pattern_df, aes(x = hour, y = n, fill = word)) +
geom_col(position = "dodge")
Upvotes: 4