domtr
domtr

Reputation: 19

Creating a graph of frequency of a specific word from a dataframe over a time period in R

I have a dataframe of tweets in R, looking like this:

  tweet_text                                              tweet_time          rdate        twt
  <chr>                                                   <dttm>              <date>     <dbl>
1 No New England cottage is complete without nautical t.. 2016-08-25 09:21:00 2016-08-25     1
2 Justice Scalia spent his last hours with members of co… 2016-11-24 16:28:00 2016-11-24     1
3 WHAT THE FAILED OKLAHOMA ABORTION BILL TELLS US http:/… 2016-11-24 16:27:00 2016-11-24     1
4 Bipartisan bill in US Senate to restrict US arms sales… 2016-10-26 07:03:00 2016-10-26     1
5 #MustResign campaign is underway with the heat p his S… 2016-10-01 08:15:00 2016-10-01     1

Each tweet has a specific date assigned, all tweets in the dataframe are from a period of one year. I want to find out a frequency of one specific word ("Senate" for example) over the entire period and plot a graph capturing how the frequency changed over time. I am fairly new to R and I could only think of super complicated ways to do it, but I am sure there must be some that's really easy and simple.

I appreciate any suggestions.

Upvotes: 1

Views: 425

Answers (1)

Justin Landis
Justin Landis

Reputation: 2071

textFreq <- function(pattern, text){
    freq <- gregexpr(pattern = pattern, text = text, ignore.case = TRUE)
    freq <- lapply(freq, FUN = function(x){
            if(length(x)==1&&x==-1){
                return(0)
            } else {
                return(length(x))
            }
        })
    freq <- unlist(freq)
    return(freq)
}

test.text <- c("senate.... SENate.. sen","Working in the senate...", "I like dogs")
textFreq(pattern = "senate", test.text)
# [1] 2 1 0

you can use dplyr to group by time periods and use mutate

library(dplyr)
library(magrittr)
data <- data %>% 
    group_by(*somedatefactor*) %>% #if you wanted to aggrigate every 10 days or something
    mutate(SenateFreqPerTweet = textFreq(pattern = "Senate", text = tweet_text),
           SenateFreqTotal = sum(SenateFreqPerTweet)) #Counts sum based on current grouping

You may even wrap the previous statement into another function. To do so check out programming with dplyr

But regardless, using this approach you can easily plot the SenateFreqTotal with ggplot2 package

 data2 <- data %>% #may be helpful to reduce the size of the dataframe before plotting.
     select(SenateFreqTotal, *somedatefactor*) %>% 
     distinct()
 ggplot(data2, aes(y=SenateFreqTotal, x = *somedatefactor*)+ geom_bar(stat="identity")

if you do not want to aggregate the frequencies you can just plot like so

ggplot(data, aes(y=SenateFreqPerTweet, x = tweet_time)) + 
    geom_bar(stat = "identity")

Upvotes: 1

Related Questions