Reputation: 1394
I have problem at hand, that requires me to monitor as set of files that are being accessed. The files can both be accessed from the Hadoop file system and the file system on the Linux machine.
I need to constantly monitor and integrate logs from both the file-systems, for a group of specific few files.
Any Ideas on how this can be done ?
Upvotes: 2
Views: 187
Reputation: 1494
Starting with Apache Hadoop 2.6.0 (or for Cloudera CDH users, 5.2.0 onwards), HDFS has added inotify-like features. The JIRA for this carries a design document that details the form of implementation HDFS supports for such a need.
A test case from the same implementation further illustrates how to make use of the feature: TestDFSInotifyEventInputStream
Remember also that by default the access time granularity (and thereby tracking) for HDFS is very low given its WORM performance semantics. You may want to reduce the value at the NameNode configuration, for the property dfs.namenode.accesstime.precision
, from its default of 1 hour (in ms).
Upvotes: 1