Arturo Calvo
Arturo Calvo

Reputation: 58

is there a way to search specific word in a twitter timeline over a period of time?

I am doing some twitter analysis in R, right now im getting the tweets from an account, but i want to search for 1 or more specific words inside those tweets and then do a plot that shows how much has the word been repeated over a period of time, for example 1 week.

this is my script:

#instalar librerias
library(wordcloud)
library(SnowballC)
library(rtweet)
library(tm)
library(RColorBrewer)
library(tidytext)
library(dplyr)
library(wordcloud2)
library(stringr)
library(qdapRegex)

# Identificacion y obtencion de tokens
appname <- "AnalisisTwitSent"
consumer_key     <- "CvgpjfxMIyUmg21HFPSKoFKr4"
consumer_secret  <- "5VO0fWH6QK5jyYWx4PtABHyhvvZ5JyVjDNjQ2F36mDjYibu5g7"
access_token <- "2820319925-CTKOd9yiA8MmJlak1iXUDCbg2MKkKDlffjr9LyV"
access_secret <- "ZiZBJIjxqY9lNLemYdGxMD6BYM6eY43NyLGhRS4NRKu5S"

twitter_token <- create_token(app = appname, 
                           consumer_key = consumer_key, 
                           consumer_secret = consumer_secret,
                           access_token = access_token, 
                           access_secret = access_secret,
                           set_renv = TRUE)

ver_palabras_comunes_nube <- function(busqueda, cantidad) {

  #Obtener tweets
  #tweets <- get_timeline(usuario, n = cantidad, 
                     #parse = TRUE, check = TRUE,
                     #include_rts = TRUE)
  tweets <- search_tweets(busqueda, cantidad, include_rts = FALSE)

  text <- str_c(tweets$text, collapse = "")

  # continue cleaning the text
  text <- 
    text %>%
    str_remove("\\n") %>%                   # remove linebreaks
    rm_twitter_url() %>%                    # Remove URLS
    rm_url() %>%
    str_remove_all("#\\S+") %>%             # Remove any hashtags
    str_remove_all("@\\S+") %>%             # Remove any @ mentions
    removeWords(stopwords("spanish")) %>%   # Remove common words (a, the, it etc.)
    removeNumbers() %>%
    stripWhitespace() %>%
    removeWords(c("amp"))                   # Final cleanup of other small changes
    gsub("\\p{So}|\\p{Cn}", "", text, perl = TRUE)


  rm_emoticon(text, replacement = "")

  # Convert the data into a summary table
  textCorpus <- 
    Corpus(VectorSource(text)) %>%
    TermDocumentMatrix() %>%
    as.matrix()

  textCorpus <- sort(rowSums(textCorpus), decreasing=TRUE)
  textCorpus <- data.frame(word = names(textCorpus), freq=textCorpus, row.names = NULL)

  wordcloud <- wordcloud2(data = textCorpus, minRotation = 0, maxRotation = 0)
  wordcloud
}

Upvotes: 4

Views: 349

Answers (1)

JBGruber
JBGruber

Reputation: 12420

To get to a frequency plot over time for specific words, you only really have to count how often they appear in each time slot and then plot them. I'm using the tidytext package here which works really nicely for this. But you could also think about just using stringr::str_count() (watch out or correct tokenisation in this case though). You put your code in a function, which isn't really necessary in this case but I wrote the code so you can quickly put it back into a function if you like.

library(rtweet)
library(tidyverse)
library(tidytext)

# define variables
busqueda <- "Poppycock"   
cantidad <- 100
pattern <- c("is", "to")

# query tweets
tweets <- search_tweets(busqueda, cantidad, include_rts = FALSE)


# count the occurence of the pattern words
pattern_df <- tweets %>% 
  select(status_id, text, created_at) %>%          # only keep data columns we need later
  unnest_tokens(word, text) %>%                    # split the text into tokens (words)
  filter(word %in% pattern) %>%                    # only keept words defined in pattern
  mutate(hour = lubridate::hour(created_at)) %>%   # extract the hour from the created_at time, use week here if you want
  count(word, hour)                                # count the words per hour

# plot
ggplot(pattern_df, aes(x = hour, y = n, fill = word)) +
  geom_col(position = "dodge")

Upvotes: 4

Related Questions