Reputation: 1285
Is it possible to specify a sampling rate to Flume before the records get written to HDFS? Is there some flume sink config for doing that or do we need to write our own Flume interceptor for sampling? I could not find any documentation on the Apache Flume user guide page.
Upvotes: 0
Views: 97
Reputation: 2759
Yes you can achieve that by specifying batch sizes in hdfs sink:
hdfs.batchSize = 100 // 100 is the default.
You should also make sure that you specify a channel capacity that's large enough, too.
Upvotes: 1