Yohn
Yohn

Reputation: 580

What would happen to a MapReduce job if input data source keep increasing in HDFS?

We have a log collection agent running with HDFS, that is, the agent(like Flume) keeps collecting logs from some applications and then writes then to HDFS. The reading and writing process are running without a break, leading the destination files of HDFS keeping increasing.

And here is the question, since the input data is changing continuously, what would happen to a MapReduce job if I set the collection agent's destination path as the job's input path?

FileInputFormat.addInputPath(job, new Path("hdfs://namenode:9000/data/collect"));

Upvotes: 1

Views: 81

Answers (1)

Mikhail Golubtsov
Mikhail Golubtsov

Reputation: 6653

A map-reduce job processes only data available at the start.

Map-Reduce is for batch data processing. For continuous data processing use tools like Storm or Spark Streaming.

Upvotes: 1

Related Questions