Reputation: 59
In Apache Druid configuration you can select the granuality of the segments (hour/day/week/etc.). What will happen if you change the granuality later? Will the new settings be applied just to the new data and old segments will be left as it is, or it will regenerated old segments too? for example, if we decide to change from day granuality to week..
Upvotes: 2
Views: 1086
Reputation: 1
This question is 2 years old but I found the answers falling a little short.
First, it's important to understand the distinction between segment granularity and query granularity. Check one of the other answers if you're unsure. See also: granularitySpec
Changing the segment granularity can be done pretty much at any time. However, it's important to note that time chunks must be consistent. Druid can't mix segment sizes for the same time chunk. If a segment already appears in a given time chunk then all segments in that time chunk must be the same size.
If you are using a streaming spec then this will be rolled over for you automatically. The rest of my answer pertains to how Druid rolls over the segment size for a streaming spec and may provide guidance if you are defining your own time chunks using a batch spec.
Decreasing the segment size
When decreasing segment size, Druid will finish the current segment interval before using the new one. For example; changing from DAY to HOUR will continue to use DAY segment size until the end of the day (UTC). When a new DAY segment would normally be created, it will create a new segment using the HOUR segment size.
Example: If you change segment size from DAY to HOUR at 2024-10-23T10:45 then you will see the following segments:
...
2024-10-23T00:00_2024-10-24T00:00
2024-10-24T00:00-2024-10-24T01:00
2024-10-24T01:00_2024-10-24T02:00
...
Increasing the segment size
When increasing segment size, Druid will try to fill the gaps between the current segment interval using progressively larger segment sizes until it meets your new segment size. For example; changing from HOUR to DAY will continue to create HOUR segments until the start of a larger segment interval, at which point it will switch to DAY, if possible, or else use the next next available segment size. See here for supported segment granularities.
Example 1: If you change segment size from HOUR to DAY at 2024-10-23T10:45 then you will see the following segments:
...
2024-10-23T10:00_2024-10-23T11:00
2024-10-23T11:00_2024-10-23T12:00
2024-10-23T12:00_2024-10-23T18:00
2024-10-23T18:00_2024-10-23T00:00
2024-10-24T00:00-2024-10-25T00:00
...
Example 2: If you change segment size from MINUTE to HOUR at 2024-10-23T10:03 then you will see the following segments:
...
2024-10-23T10:03_2024-10-23T10:04
2024-10-23T10:04_2024-10-23T10:05
2024-10-23T10:05_2024-10-23T10:10
2024-10-23T10:10_2024-10-23T10:15
2024-10-23T10:15_2024-10-23T10:30
2024-10-23T10:30_2024-10-23T11:00
2024-10-23T11:00_2024-10-23T12:00
...
Important Caveat: There must be existing segment files. I have had issues changing the segment granularity on a suspended supervisor that has had its data dropped. In this scenario you must match the old segment granularity or the ingestion job will fail.
Note: A manual compaction job can be used to convert historical segments to the new segment granularity. Just be sure to set the interval so that it contains entire segment files or those segments will be skipped.
Note: I have tested this on Druid 29.0.1 and many previous versions.
Note: Enabling the experimental concurrent locks feature may or may not impact my answer. I have no experience with concurrent locks but I suspect it won't impact how segment granularity is window rolled.
See also: Time Chunks
Upvotes: 0
Reputation: 41
In short, you can change segment granularity for newly created segments going forward, but other cluster features can work differently when changing segment granularity, so really, possible breakage could affect whether you "can" change it in the future.
Existing segments are immutable, so they retain whatever segment granularity in effect during segment creation.
You are free to change the new segment granularity in the future (for creating new segments)
These are the types of ingest errors that can occur when changing the segment granularity
But it is less likely you would see other problems like broken queries with different/mixed segment granularities.
Upvotes: 0
Reputation: 619
Changing the granularity does not affect the data which was stored previously. If you want you can do this with a reindex task.
Please note that there is a difference between segment granularity and query granularity.
In a nutshell, segment granularity describes how much data will be stored in 1 segment. Query granularity describes the size of the "grouped" data which is returned. These 2 can be different from each other.
For example, if you can have a segment granularity size "week", and your query granularity set to "hour". In this situation, all your data is stored in a file "per week", with as smallest dataset hour data.
If you are using PHP, you can use this package which allows you to easily compact or re-index your segments to different granularity sizes.
Upvotes: 1
Reputation: 131
What will happen if you change the granuality later? Will the new settings be applied just to the new data and old segments will be left as it is
Segments are immutable, so changing the granularity will only apply to new data.
[Will] old segments will be left as it is, or it will regenerated old segments too? for example, if we decide to change from day granuality to week
Old segments will retain the granularity with which they were ingested, while new segments will be committed and published to deep storage with the updated granularity.
In other words, with your example, old segments would retain their day granularity, while new segments would be committed and published with week granularity.
Upvotes: 3