Ahmed
Ahmed

Reputation: 719

Flume HDFS Source

I want to use flume to transfert data from hdfs directory into directory in hdfs, in this transfer I want to apply processing morphline.

For example: my source is

"hdfs://localhost:8020/user/flume/data"

and my sink is

"hdfs://localhost:8020/user/morphline/"

Is it possible with flume?

If yes, what is the type for the source flume?

Upvotes: 1

Views: 3853

Answers (2)

Boris Churzin
Boris Churzin

Reputation: 1203

Another option is to connect a netcat source to the same sink and just cat the files into it...

Upvotes: 0

frb
frb

Reputation: 3798

as far as I know, there is no source for reading HDFS data. The main reason is that Flume is intended for moving large amount of data that in some way is sent to the agent. As stated in the documentation:

"A Flume source consumes events delivered to it by an external source like a web server. The external source sends events to Flume in a format that is recognized by the target Flume source. For example, an Avro Flume source can be used to receive Avro events from Avro clients or other Flume agents in the flow that send events from an Avro sink. A similar flow can be defined using a Thrift Flume Source to receive events from a Thrift Sink or a Flume Thrift Rpc Client or Thrift clients written in any language generated from the Flume thrift protocol."

You have all the available sources at the official web page.

Being said that, you will need some process in charge of reading the input HDFS file and send it to any of the available sources. Probably the ExecSource is suitable for your needs, due to you can specify a command that will be run in order to produce the input data. Such a command could be a hadoop fs -cat /hdfs/path/to/input/data or something like that.

Nevertheless, and thinking on the processing you want to do, I guess you will need a custom sink in order to achieve it. I mean, the source part is just for reading the data and putting it into the Flume channel in the form of Flume events. Then, a sink or sinks will consume such events by processing them and generating the appropriate output.

Upvotes: 5

Related Questions