Reputation: 7362
I'm using Impala, and I know impala does its processing in memory. I've searched for a list of Impala configuration options, but I haven't found any thorough documentation on this, particularly with regard to memory/heap. Does Impala have such settings? Or does it rely on the hdfs/datanode heap space? I know you can cap impala memory usage with -mem_limit
, but I'm trying to better understand how this is done.
Upvotes: 0
Views: 2722
Reputation: 4236
As of the Impala 1.4.0 release, included in CDH 5.1.0, Impala uses both memory and disk during query processing. To learn more about how to control Impala's use of memory, I recommend reading through the Cloudera documentation on Impala, especially:
You'll find more information on how to configure many aspects of Impala's memory use, including integration with HDFS caching and Hadoop YARN (via Llama). For more on HDFS caching, see Andrew Wang and Colin McCabe's presentation from Hadoop Summit 2014. For more on Llama, see Henry Robinson's presentation from Hadoop World NYC 2013.
Upvotes: 2