Reputation: 11
How do we get the twitter(Tweets) into HDFS for offline analysis. we have a requirement to analyze tweets.
Upvotes: 1
Views: 1548
Reputation: 1441
Fluentd log collector just released its WebHDFS plugin, which allows the users to instantly stream data into HDFS.
Also by using fluent-plugin-twitter, you can collect Twitter streams by calling its APIs. Of course you can create your custom collector, which posts streams to Fluentd. Here's a Ruby example to post logs against Fluentd.
Upvotes: 1
Reputation: 8088
I would look for solution in well developed area of streaming logs into hadoop, since the task looks somewhat similar.
There are two existing systems doing so:
Flume: https://github.com/cloudera/flume/wiki
And
Scribe: https://github.com/facebook/scribe
So your task will be only to pull data from twitter, what I asume is not part of this question and feed one of these systems with this logs.
Upvotes: 3
Reputation: 2497
This can be a solution to your problem.
Tools to capture Twitter tweets
Capture it in any format. (csv,txt,doc,pdf.....etc)
Upvotes: 0