Valentin Proust
Valentin Proust

Reputation: 1

Cloudera Impala performance test - Empty cache

I try to execute a performance test on a cloudera hadoop cluster. However, as far as Impala uses cache to store previous queries, how can I empty cache ?

Does Impala use caching? Impala does not cache data but it does cache some table and file metadata. Although queries might run faster on subsequent iterations because the data set was cached in the OS buffer cache, Impala does not explicitly control this.

Quoted from : http://www.cloudera.com/content/cloudera/en/documentation/cloudera-impala/latest/topics/impala_faq.html#faq_performance_unique_1__faq_caching_unique_1

Upvotes: 0

Views: 2383

Answers (1)

Matt
Matt

Reputation: 4334

The file metadata caching is different from "query caching". It is just caching the locations of files and blocks in HDFS, which is something that most databases already know but Impala may not because it gets table/file metadata from Hive. The file metadata should be available to Impala in your tests.

Impala never caches queries, but file data may be cached in one of two ways:

  1. You've enabled HDFS caching. I assume you're not doing this.
  2. Some data read by HDFS may be in the OS buffer cache. Impala has no control over this. Some googling turns up guidance about clearing the Linux buffer caches, e.g. this unix.stackexchange.com answer.

Upvotes: 0

Related Questions