Sambit Tripathy
Sambit Tripathy

Reputation: 444

hadoop fs -ls out of memory error

I have 300000+ files on a HDFS data directory.

When I do a hadoop fs -ls and I am getting an out of memory error saying GC Limit has exceeded. The cluster nodes have 256 GB of RAM each. How do I fix it?

Upvotes: 6

Views: 5494

Answers (2)

Jack Davidson
Jack Davidson

Reputation: 4943

You can make more memory available to the hdfs command by specifying 'HADOOP_CLIENT_OPTS'

HADOOP_CLIENT_OPTS="-Xmx4g" hdfs dfs -ls /

Found here: http://lecluster.delaurent.com/hdfs-ls-and-out-of-memory-gc-overhead-limit/

This fixed the problem for me, I had over 400k files in one directory and needed to delete most but not all of them.

Upvotes: 13

Anay T
Anay T

Reputation: 56

Write a python script to split the files into multiple directories and run through them. First of all what are you trying to achieve when you know you have 300000+ files in a directory. If you want to concatenate better arrange them into sub dirs.

Upvotes: 1

Related Questions