Will Twitter's rate limits allow me to do the data mining necessary to construct a complete social network graph of about 600K users?

Primary question: Will Twitter's rate limits allow me to do the data mining necessary to construct a complete social network graph with all directed edges among about 600K users?

Here is the idea:

The edges/ties/relations in the network will be follower/followed relationships.

Start with a specific list of approximately 600 Twitter users, chosen because they all are from all of the news outlets in a large city.

Collect all of the followers and friends (people they follow) for all 600 users. These users probably have an average number of followers of 2,000 each. They probably have an average number of friends (people they follow) of 500.

Since these followers of the 600 are all in the same city, it is expected that many of these followers would be the same users following these 600 people. So let's approximate and guess that these 600 users have approximately 600,000 followers and friends in total. So this would be a subgraph/network of 600,600 total Twitter users.

So once I have collected all of the 600,000 followers and friends of all of these 600 people, I want to be able to construct a social network of all of these 600,600 people AND their followers. This would require me to be able to at least find all of the directed edges amongst these 600,600 users (whether or not each of these 600,600 users follow each other). With Twitter rate limits, would this kind of data mining be feasible?

Upvotes: 4

Answers (2)

Mehdi

Reputation: 7403

Primary question: Will Twitter's rate limits allow me to do the data mining (...)

Yes, it is technically feasible, however it will take ages in case you are using only one API user access tokens. I mean here probably more than 6 Months of uninterrupted run.

To be more precise:

the extraction of nodes (twitter users) can be done very quickly as you will use users/lookup API endpoint, which lets you extract 100 nodes per request, and make 180 requests per 15 minutes window (per access token you have)
the extraction of edges (follow relationship between users) is the slow part, you will use friends/ids and followers/ids API endpoints, limited at 15 queries per 15 minutes and letting you extract at most 5000 friends of followers for a unique user per request.

You can use the nodes metadata (descriptions texts, locations, languages, time zones) to perform some interesting analysis, even without having extracted the 'graph' (follow relationships between everyone)

A work around this is to parallelize sub-parts of the extraction by spreading the extraction across several access tokens. Seems compliant to me regarding the terms of use, as long as you respect protected accounts.

In any case you should filter out extraction of edges for celebrities (you probably do not want to extract the followers of hootsuite, there are almost 6 millions of them).

disclaimer: self-promotion here: in case you do not want to develop this yourself I could do the extraction for you and provide you the graph file, as I am extracting twitter graphs at tribalytics. (I have read this and that before posting).

I'm also trying to figure out if there is any process for requesting higher rate limits for any kind of research purposes

Officially, there are no more white-listed apps with higher rate limits, like there could be with previous version of twitter's API. You probably should still contact twitter and see whether they can help you as your work is for academic purpose.

Chances are that I will have to scale down the project, which is OK

I would advise you to reduce your initial list of 600 users as much as you can. Only keep those who are really central regarding to your topic, and whose audience is not too large. Extracting graph of local celebrities will give you a graph with many people not related at all to the population you want to study.

Upvotes: 0

TJE

Reputation: 580

I'll answer these questions in reverse order, starting with David Marx first: Well, I do have access to a pretty robust computer research center with a ton of storage capacity, so that should not be an issue. I don't know if the software can handle it, however.

Chances are that I will have to scale down the project, which is OK. The idea for me is to start out with a bigger idea, figure out how big it can be, and then pare down accordingly.

Following up on Anony-Mousse's question now: Part of my problem is that I am not sure I am interpreting the Twitter rate limits correctly. I'm not sure if it's 15 requests per 15 minutes, or 30 requests per 15 minutes. And I think 1 request will get 5000 followers/friends, so you could presumably collect 75,000 friends or followers every 15 minutes if the limit is 15 requests per 15 minutes. I'm also trying to figure out if there is any process for requesting higher rate limits for any kind of research purposes.

Here is where they list the limits: https://dev.twitter.com/docs/rate-limiting/1.1/limits

Upvotes: 1

Will Twitter&#39;s rate limits allow me to do the data mining necessary to construct a complete social network graph of about 600K users?

Answers (2)

Related Questions

Will Twitter's rate limits allow me to do the data mining necessary to construct a complete social network graph of about 600K users?