Reputation: 157
I am trying to analyze Twitter data using R, by plotting the number of tweets over a period of time, when I write
plot(tweet_df$created_at, tweet_df$text)
I got this error message:
Error in plot.window(...) : need finite 'xlim' values
In addition: Warning messages:
1: In xy.coords(x, y, xlabel, ylabel, log) : NAs introduced by coercion
2: In xy.coords(x, y, xlabel, ylabel, log) : NAs introduced by coercion
3: In min(x) : no non-missing arguments to min; returning Inf
4: In max(x) : no non-missing arguments to max; returning -Inf
5: In min(x) : no non-missing arguments to min; returning Inf
6: In max(x) : no non-missing arguments to max; returning -Inf
Here is the code which I used:
library("rjson")
json_file <- "tweet.json"
json_data <- fromJSON(file=json_file)
library("streamR")
tweet_df <- parseTweets(tweets=file)
#using the twitter data frame
tweet_df$created_at
tweet_df$text
plot(tweet_df$created_at, tweet_df$text)
Upvotes: 1
Views: 1152
Reputation: 615
You've got a couple issues here, but nothing insurmountable. If you want to track tweets over time, you're really asking for the tweets created per x time frame (tweets per minute, second, whatever). So that means you only need the created_at
column, and you can build the graph with R's hist
function.
If you want to split by words mentioned in text or whatever, that's doable too but you should probably use ggplot2
to do it and maybe ask a different question. Anyways it looks like parseTweets
converts twitters timestamps to a character field, so you'll want to turn it into a POSIXct
timestamp field that R can understand. Assuming you have a data frame that looks something like this:
❥ head(tweet_df[,c("id_str","created_at")])
id_str created_at
1 597862782101561346 Mon May 11 20:36:09 +0000 2015
2 597862782097346560 Mon May 11 20:36:09 +0000 2015
3 597862782105694208 Mon May 11 20:36:09 +0000 2015
4 597862782105694210 Mon May 11 20:36:09 +0000 2015
5 597862782076198912 Mon May 11 20:36:09 +0000 2015
6 597862782114078720 Mon May 11 20:36:09 +0000 2015
You can do that like this:
❥ dated_tweets <- as.POSIXct(tweet_df$created_at, format = "%a %b %d %H:%M:%S +0000 %Y")
That will give you a vector of dated tweets in R's timestamp format. You can then plot them like this. I left open the sample twitter feed for 15 mins or so. This is the result:
❥ hist(dated_tweets, breaks ="secs", freq = TRUE)
Upvotes: 3