Happy
Happy

Reputation: 121

What is intermediate persist in Apache Druid?

How does Druid persist real time ingested data before it hands off to Deep storage?

In the document, Druid has configuration about intermedatepersistperiod, and maxpendingpersists. But it doesn't say much about what is intermediate persist, how it works.

From the words, I assume it persists data periodically for real time data in memory. Given segment granularity in hours/days, if no mechanism to persist data before the segment time is up, it creates availability and reliability issues.

Upvotes: 3

Views: 1097

Answers (1)

Jor
Jor

Reputation: 376

Excellent question! The call to persist is defined in the Appenderator interface that gives the API for how data is indexed, how data is pushed into deep storage, and how data is persisted during ingestion. The comment for the method "Appenderator.persistAll()" reads:

Persist any in-memory indexed data to durable storage. This may be only somewhat durable, e.g. the machine's local disk.

The function is defined by default in AppenderatorImpl.java, where persistAll() ultimately calls writeCommit() to make a persist, which ultimately just writes values to a json file called commit.json.

So to answer your question, by default, druid persists real time ingested data by just writing to the Peon's disk.

Upvotes: 1

Related Questions