Kartik Ramalingam
Kartik Ramalingam

Reputation: 11

Twitter - Hadoop Data Streaming

How do we get the twitter(Tweets) into HDFS for offline analysis. we have a requirement to analyze tweets.

Upvotes: 1

Views: 1548

Answers (3)

Kazuki Ohta
Kazuki Ohta

Reputation: 1441

Fluentd log collector just released its WebHDFS plugin, which allows the users to instantly stream data into HDFS.

enter image description here

Also by using fluent-plugin-twitter, you can collect Twitter streams by calling its APIs. Of course you can create your custom collector, which posts streams to Fluentd. Here's a Ruby example to post logs against Fluentd.

Upvotes: 1

David Gruzman
David Gruzman

Reputation: 8088

I would look for solution in well developed area of streaming logs into hadoop, since the task looks somewhat similar.
There are two existing systems doing so:
Flume: https://github.com/cloudera/flume/wiki
And
Scribe: https://github.com/facebook/scribe

So your task will be only to pull data from twitter, what I asume is not part of this question and feed one of these systems with this logs.

Upvotes: 3

Debaditya
Debaditya

Reputation: 2497

This can be a solution to your problem.

Upvotes: 0

Related Questions