Reputation: 580
We have a log collection agent running with HDFS, that is, the agent(like Flume) keeps collecting logs from some applications and then writes then to HDFS. The reading and writing process are running without a break, leading the destination files of HDFS keeping increasing.
And here is the question, since the input data is changing continuously, what would happen to a MapReduce job if I set the collection agent's destination path as the job's input path?
FileInputFormat.addInputPath(job, new Path("hdfs://namenode:9000/data/collect"));
Upvotes: 1
Views: 81
Reputation: 6653
A map-reduce job processes only data available at the start.
Map-Reduce is for batch data processing. For continuous data processing use tools like Storm or Spark Streaming.
Upvotes: 1