Reputation: 83
I am working with a large Twitter data-set, I am trying to count the word column and group by hour using the Time column, then display it as a histogram so I can see how the words changed over time (distribution of words over time). I was wondering if anybody knows how I can do this with R?
Sample of the data is accessible via this link: https://docs.google.com/spreadsheets/d/1JhXEyzkjPs59hVgoS3lW7e0Fcumis62QDUvuMP2q5aQ/edit?usp=sharing
Thanks, James
Upvotes: 0
Views: 476
Reputation: 4993
Read your file into R, (I assumed the variable you set the file data into was x in my code below) then use the following:
require(dplyr)
x%>%group_by(Time, Word)%>%
summarise(count=n())
It returns output like this:
Time Word count
<fctr> <fctr> <int>
1 2015/04/30 21:59:00 a 1
2 2015/04/30 21:59:00 baltimore 1
3 2015/04/30 21:59:00 check 1
4 2015/04/30 21:59:00 common 1
5 2015/04/30 21:59:00 grabbed 1
6 2015/04/30 21:59:00 have 1
7 2015/04/30 21:59:00 her 1
Which you can capture in a data table or data frame
Upvotes: 1