glcohen
glcohen

Reputation: 193

Ways to pull (potentially) large amounts of data from Twitter

I've been playing around with the Twitter API using Twitter4j. I am trying to pull data given a keyword and date, and example of a query I would run using the REST API would be

bagels since:2014-12-27

Which would give me all tweets containing the keyword 'bagels' since 2014-12-27.

This works in theory, but I've quickly exceeded the rate limits since each query allows up to 100 results, and only 180 queries are allowed within a 15-minute interval. There are many keywords that return more than 18k results.

Is there a better way to pull large amounts of data from Twitter? I looked at the Streaming API but I don't know if I can pull data from a certain date range.

Upvotes: 1

Views: 167

Answers (1)

Joe Mayo
Joe Mayo

Reputation: 7513

There are a few things you can do to improve your rates:

  1. Make sure your count is maxed at 100, which it looks like you're doing.
  2. Use Application-Only authorization - it increases your rate limit to 450.
  3. Use the max_id, since_id parameters to page through data and avoid querying for results you're already received. See the Working with Timelines docs to see what I mean.
  4. Consider using Gnip if you're willing to pay to remove rate limits.

Upvotes: 1

Related Questions