Reputation: 19
I have a dataframe of tweets in R, looking like this:
tweet_text tweet_time rdate twt
<chr> <dttm> <date> <dbl>
1 No New England cottage is complete without nautical t.. 2016-08-25 09:21:00 2016-08-25 1
2 Justice Scalia spent his last hours with members of co… 2016-11-24 16:28:00 2016-11-24 1
3 WHAT THE FAILED OKLAHOMA ABORTION BILL TELLS US http:/… 2016-11-24 16:27:00 2016-11-24 1
4 Bipartisan bill in US Senate to restrict US arms sales… 2016-10-26 07:03:00 2016-10-26 1
5 #MustResign campaign is underway with the heat p his S… 2016-10-01 08:15:00 2016-10-01 1
Each tweet has a specific date assigned, all tweets in the dataframe are from a period of one year. I want to find out a frequency of one specific word ("Senate" for example) over the entire period and plot a graph capturing how the frequency changed over time. I am fairly new to R and I could only think of super complicated ways to do it, but I am sure there must be some that's really easy and simple.
I appreciate any suggestions.
Upvotes: 1
Views: 425
Reputation: 2071
textFreq <- function(pattern, text){
freq <- gregexpr(pattern = pattern, text = text, ignore.case = TRUE)
freq <- lapply(freq, FUN = function(x){
if(length(x)==1&&x==-1){
return(0)
} else {
return(length(x))
}
})
freq <- unlist(freq)
return(freq)
}
test.text <- c("senate.... SENate.. sen","Working in the senate...", "I like dogs")
textFreq(pattern = "senate", test.text)
# [1] 2 1 0
you can use dplyr
to group by time periods and use mutate
library(dplyr)
library(magrittr)
data <- data %>%
group_by(*somedatefactor*) %>% #if you wanted to aggrigate every 10 days or something
mutate(SenateFreqPerTweet = textFreq(pattern = "Senate", text = tweet_text),
SenateFreqTotal = sum(SenateFreqPerTweet)) #Counts sum based on current grouping
You may even wrap the previous statement into another function. To do so check out programming with dplyr
But regardless, using this approach you can easily plot the SenateFreqTotal
with ggplot2
package
data2 <- data %>% #may be helpful to reduce the size of the dataframe before plotting.
select(SenateFreqTotal, *somedatefactor*) %>%
distinct()
ggplot(data2, aes(y=SenateFreqTotal, x = *somedatefactor*)+ geom_bar(stat="identity")
if you do not want to aggregate the frequencies you can just plot like so
ggplot(data, aes(y=SenateFreqPerTweet, x = tweet_time)) +
geom_bar(stat = "identity")
Upvotes: 1