Reputation: 3
I'm observing that Druid query performance can benefit from the previous queries. Thus, I'm trying to understand the reasons. I Know that Druid uses cache (I'm using cache in the Broker), but this cache just stores the result of the queries per segment (right?). However, I have noticed that if the subsequent queries use the same segments, the performance improves.
Example:
I searched and found that this behavior can be achieved by the OS page cache. Based on my research, I think that Druid, during the first query to the datasource, loads the necessary segments to memory (OS page cache). And the segments can be read faster in the next queries.
Am I right? I looked in the Druid documentation and I was unable to find anything helpful.
Can you please give me some help explaining this awesome behavior?
Best regards,
José Correia
Upvotes: 0
Views: 1046
Reputation: 4544
Druid does uses caching to improve the performance at various levels. At segment level on historicals and druid query level on broker. The more memory you give it the faster it works.
Below is the documentation on the caching -
Query Caching
Druid supports query result caching through an LRU cache. Results are stored on a per segment basis, along with the parameters of a given query. This allows Druid to return final results based partially on segment results in the cache and partially on segment results from scanning historical/real-time segments.
Segment results can be stored in a local heap cache or in an external distributed key/value store. Segment query caches can be enabled at either the Historical and Broker level (it is not recommended to enable caching on both).
Query caching on Brokers
Enabling caching on the broker can yield faster results than if query caches were enabled on Historicals for small clusters. This is the recommended setup for smaller production clusters (< 20 servers). Take note that when caching is enabled on the Broker, results from Historicals are returned on a per segment basis, and Historicals will not be able to do any local result merging.
Query caching on Historicals
Larger production clusters should enable caching only on the Historicals to avoid having to use Brokers to merge all query results. Enabling caching on the Historicals instead of the Brokers enables the Historicals to do their own local result merging and puts less strain on the Brokers.
Druid broker doesn't do anything directly at the OS page cache, if there is virtual memory available by the os, than based on memory requirements the heaps are allocated.
Upvotes: 2