Reputation: 73
I am working with apache flink and spark and a twitter conntector (flink-connector-twitter_2.12 and spark-streaming-twitter from apache.bahir) to receive real time tweets and predict them through a svm.
Flink:
val streamSource: DataStream[String] = strEnv.addSource(new TwitterSource(properties))
...
Spark:
TwitterUtils.createStream(streamingContext, auth)
...
however, both applications are running on a cluster using the mentioned APIs.
My problem is the low input rate from twitter. The spark application has a avg of: 51.98 records/sec which is compared to the real twitter data (6k per second) extremly low.
Question: Is there any way to improve the input rate?
I appreciate any help :) thanks
Upvotes: 0
Views: 87
Reputation: 3864
By default Flink uses the sample api. This API returns the sample of tweets in real time.It's worth noting that this API is limited, just as all standard non-paid Twitter APIs, the rate limiting is described in detail here. The best idea would be to switch to Premium Twitter API which does not have limitations.
Upvotes: 4