Reputation: 143
I am trying to grab tweets using the Topsy Otter api, so I can perform some data mining on it for my dissertation.
So far, I have got:
library(RJSONIO)
library(RCurl)
tweet_data <- getURL("http://otter.topsy.com/search.json?q=PSN&mintime=1301634000&perpage=10&maxtime=1304226000&apikey=xxx")
fromJSON(tweet_data)
Which works fine. Now however, I want to return just a couple details from this file, 'content' and 'trackback_date'. I cannot seem to figure out how - I have tried cobbling a couple of examples together, but unable to extract what I want.
Here is what I've tried so far:
trackback_date <- lapply(tweet_data$result, function(x){x$trackback_date})
content <- lapply(tweet_data$result, function(x){x$content})
Any help would be greatly appreciated, thank you.
edit I have also tried:
library("rjson")
# use rjson
tweet_data <- fromJSON(paste(readLines("http://otter.topsy.com/search.json?q=PSN&mintime=1301634000&perpage=10&maxtime=1304226000&apikey=xxx"), collapse=""))
# get a data from Topsy Otter API
# convert JSON data into R object using fromJSON()
trackback_date <- lapply(tweet_data$result, function(x){x$trackback_date})
content <- lapply(tweet_data$result, function(x){x$content})
Upvotes: 2
Views: 1726
Reputation: 4941
Basic processing of Topsy Otter API response:
library(RJSONIO)
library(RCurl)
tweet_data <- getURL("http://otter.topsy.com/search.json?q=PSN&mintime=1301634000&perpage=10&maxtime=1304226000&apikey=xxx")
#
# Addition to your code
#
tweets <- fromJSON(tweet_data)$response$list
content <- sapply(tweets, function(x) x$content)
trackback_date <- sapply(tweets, function(x) x$trackback_date)
EDIT: Processing multiple pages
Function gets 100 items from specified page
:
pagetweets <- function(page){
url <- paste("http://otter.topsy.com/search.json?q=PSN&mintime=1301634000&page=",page,
"&perpage=100&maxtime=1304226000&apikey=xxx",
collapse="", sep="")
tweet_data <- getURL(url)
fromJSON(tweet_data)$response$list
}
Now we can apply it to multiple pages:
tweets <- unlist(lapply(1:10, pagetweets), recursive=F)
And, voila, this code:
content <- sapply(tweets, function(x) x$content)
trackback_date <- sapply(tweets, function(x) x$trackback_date)
returns you 1000 records.
Upvotes: 5