Engineiro
Engineiro

Reputation: 1146

Memory impact of Symbolic Links in Namenode

Symbolic Links are supported in Hadoop 2.0 using FileContext objects createSymlinks() method.

I am looking at using symlinks heavily in a program that places all files for the previous month in Hadoop Archives (HARs), but I am wondering if using symlinks consume Namenode memory similar to having small files in HDFS which would defeat the purpose of placing these in HARs and bring me roundrobin to the original problem of small files.

Also, the reason I want to use symlinks is so that when the files are HAR'ed (and as a consequence moved) I don't have to update HBase with the new file location.

What is the memory footprint of symlinks in a NameNode?

Upvotes: 0

Views: 322

Answers (1)

Engineiro
Engineiro

Reputation: 1146

This was the answer I received from the cdh-user mailing list from a cloudera employee:

Hi Geovanie,

The NN memory footprint for a symlink is less than that of a small file, because symlinks are purely metadata and do not have associated blocks. Block count is normally the real reason why you want to avoid small files. I'd expect you to be able to have millions of symlinks with a large enough NN heap.

I'll note though that symlinks are currently only supported in FileContext, while most applications are written against FileSystem (including the FsShell). This means that they will not behave correctly with symlinks. This might change in a future release, as we've been working on FileSystem symlink support upstream.

Best, Andrew

Upvotes: 2

Related Questions