Twitter Streaming API - low Input rate in flink/spark application

Question

I am working with apache flink and spark and a twitter conntector (flink-connector-twitter_2.12 and spark-streaming-twitter from apache.bahir) to receive real time tweets and predict them through a svm.

Flink:

val streamSource: DataStream[String] = strEnv.addSource(new TwitterSource(properties))
...

Spark:

TwitterUtils.createStream(streamingContext, auth)
...

however, both applications are running on a cluster using the mentioned APIs.

My problem is the low input rate from twitter. The spark application has a avg of: 51.98 records/sec which is compared to the real twitter data (6k per second) extremly low.

Question: Is there any way to improve the input rate?

I appreciate any help :) thanks

Dominik Wosiński · Accepted Answer

By default Flink uses the sample api. This API returns the sample of tweets in real time.It's worth noting that this API is limited, just as all standard non-paid Twitter APIs, the rate limiting is described in detail here. The best idea would be to switch to Premium Twitter API which does not have limitations.

Twitter Streaming API - low Input rate in flink/spark application

Answers (1)

Related Questions