user1258683
user1258683

Reputation: 59

Apache Druid segment granuality

In Apache Druid configuration you can select the granuality of the segments (hour/day/week/etc.). What will happen if you change the granuality later? Will the new settings be applied just to the new data and old segments will be left as it is, or it will regenerated old segments too? for example, if we decide to change from day granuality to week..

Upvotes: 2

Views: 1086

Answers (4)

JRob
JRob

Reputation: 1

This question is 2 years old but I found the answers falling a little short.

First, it's important to understand the distinction between segment granularity and query granularity. Check one of the other answers if you're unsure. See also: granularitySpec

Changing the segment granularity can be done pretty much at any time. However, it's important to note that time chunks must be consistent. Druid can't mix segment sizes for the same time chunk. If a segment already appears in a given time chunk then all segments in that time chunk must be the same size.

If you are using a streaming spec then this will be rolled over for you automatically. The rest of my answer pertains to how Druid rolls over the segment size for a streaming spec and may provide guidance if you are defining your own time chunks using a batch spec.

Decreasing the segment size

When decreasing segment size, Druid will finish the current segment interval before using the new one. For example; changing from DAY to HOUR will continue to use DAY segment size until the end of the day (UTC). When a new DAY segment would normally be created, it will create a new segment using the HOUR segment size.

Example: If you change segment size from DAY to HOUR at 2024-10-23T10:45 then you will see the following segments:

...
2024-10-23T00:00_2024-10-24T00:00
2024-10-24T00:00-2024-10-24T01:00
2024-10-24T01:00_2024-10-24T02:00
...

Increasing the segment size

When increasing segment size, Druid will try to fill the gaps between the current segment interval using progressively larger segment sizes until it meets your new segment size. For example; changing from HOUR to DAY will continue to create HOUR segments until the start of a larger segment interval, at which point it will switch to DAY, if possible, or else use the next next available segment size. See here for supported segment granularities.

Example 1: If you change segment size from HOUR to DAY at 2024-10-23T10:45 then you will see the following segments:

...
2024-10-23T10:00_2024-10-23T11:00
2024-10-23T11:00_2024-10-23T12:00
2024-10-23T12:00_2024-10-23T18:00
2024-10-23T18:00_2024-10-23T00:00
2024-10-24T00:00-2024-10-25T00:00
...

Example 2: If you change segment size from MINUTE to HOUR at 2024-10-23T10:03 then you will see the following segments:

...
2024-10-23T10:03_2024-10-23T10:04
2024-10-23T10:04_2024-10-23T10:05
2024-10-23T10:05_2024-10-23T10:10
2024-10-23T10:10_2024-10-23T10:15
2024-10-23T10:15_2024-10-23T10:30
2024-10-23T10:30_2024-10-23T11:00
2024-10-23T11:00_2024-10-23T12:00
...

Notes

Important Caveat: There must be existing segment files. I have had issues changing the segment granularity on a suspended supervisor that has had its data dropped. In this scenario you must match the old segment granularity or the ingestion job will fail.

Note: A manual compaction job can be used to convert historical segments to the new segment granularity. Just be sure to set the interval so that it contains entire segment files or those segments will be skipped.

Note: I have tested this on Druid 29.0.1 and many previous versions.

Note: Enabling the experimental concurrent locks feature may or may not impact my answer. I have no experience with concurrent locks but I suspect it won't impact how segment granularity is window rolled.

See also: Time Chunks

Upvotes: 0

OurNewestMember
OurNewestMember

Reputation: 41

In short, you can change segment granularity for newly created segments going forward, but other cluster features can work differently when changing segment granularity, so really, possible breakage could affect whether you "can" change it in the future.

  • Existing segments are immutable, so they retain whatever segment granularity in effect during segment creation.

    • (Of course you could overshadow those segments by replacing them with new segments of a different granularity such as through compaction, but the original segments usually stick around as unused segments, just not loaded for serving queries)
  • You are free to change the new segment granularity in the future (for creating new segments)

    • But if you want to maintain uninterrupted ingestion, you can run a test beforehand to anticipate possible ingest problems with different segment granularities.

These are the types of ingest errors that can occur when changing the segment granularity

But it is less likely you would see other problems like broken queries with different/mixed segment granularities.

  • however you could see waiting or failed compaction or other batch ingest jobs depending on the change in segment granularity and how aggressively you launch your tasks
  • Also with a very large cluster you could see problems from having too many segments (or too many tasks to process the segments) -- this is unlikely but possible
  • However, if you had slow/failing compaction tasks, they might run better with a different segment granularity

Upvotes: 0

58k723f1
58k723f1

Reputation: 619

Changing the granularity does not affect the data which was stored previously. If you want you can do this with a reindex task.

Please note that there is a difference between segment granularity and query granularity.

In a nutshell, segment granularity describes how much data will be stored in 1 segment. Query granularity describes the size of the "grouped" data which is returned. These 2 can be different from each other.

For example, if you can have a segment granularity size "week", and your query granularity set to "hour". In this situation, all your data is stored in a file "per week", with as smallest dataset hour data.

If you are using PHP, you can use this package which allows you to easily compact or re-index your segments to different granularity sizes.

Upvotes: 1

user17936657
user17936657

Reputation: 131

What will happen if you change the granuality later? Will the new settings be applied just to the new data and old segments will be left as it is

Segments are immutable, so changing the granularity will only apply to new data.

[Will] old segments will be left as it is, or it will regenerated old segments too? for example, if we decide to change from day granuality to week

Old segments will retain the granularity with which they were ingested, while new segments will be committed and published to deep storage with the updated granularity.

In other words, with your example, old segments would retain their day granularity, while new segments would be committed and published with week granularity.

Upvotes: 3

Related Questions