What would happen to a MapReduce job if input data source keep increasing in HDFS?

Question

We have a log collection agent running with HDFS, that is, the agent(like Flume) keeps collecting logs from some applications and then writes then to HDFS. The reading and writing process are running without a break, leading the destination files of HDFS keeping increasing.

And here is the question, since the input data is changing continuously, what would happen to a MapReduce job if I set the collection agent's destination path as the job's input path?

FileInputFormat.addInputPath(job, new Path("hdfs://namenode:9000/data/collect"));

What would happen to a MapReduce job if input data source keep increasing in HDFS?

Answers (1)

Related Questions