Reputation: 536
I understand that whenever I create a Global Secondary Index (GSI) for a DynamoDB table it will take some time to create that GSI (depending on table size).
From what I understood, loading the items from the base table to the GSI only consumes the WCU of the GSI.
Let's assume I have a DynamoDB table with Terrabytes of data in it. if I create a GSI with 1 WCU, how long will it take for the GSI to be created (if all the items and values have to be projected)? Could it be high values such as multiple months ? (the doc states it takes around 5 minutes)
Upvotes: 1
Views: 660
Reputation: 13731
Indeed, when you add a GSI on a table with pre-existing data, a so-called backfilling process begins that reads the table's data and writes it to the GSI. There is no guarantee that this process can finish in 5 minutes. The documentation explains that the base table's RCU are not used, but the new indexes WCU are used, so if you provision too few WCU on the index, the backfilling will be slow. For example, this document, section "Adding a Global Secondary Index to a large table", says that:
The time required for building a global secondary index depends on several factors, such as the following: ... The provisioned write capacity of the index ... If you are adding a global secondary index to a very large table, it might take a long time for the creation process to complete.
... If the provisioned write throughput setting on the index is too low, the index build will take longer to complete. To shorten the time it takes to build a new global secondary index, you can increase its provisioned write capacity temporarily. As a general rule, we recommend setting the provisioned write capacity of the index to 1.5 times the write capacity of the table. This is a good setting for many use cases. However, your actual requirements might be higher or lower.
The document recommends that you look at the OnlineIndexPercentageProgress
CloudWatch metric to understand the amount of progress that the backfilling is making.
The same document also raises two more reasons why the backfilling process might be slower than you hoped:
Upvotes: 1
Reputation: 89
More of a comment. Also note that the index populates faster than the web console shows. F.e. I tested and created an index knowing precisely how many items should be there (25k). And while console still showed 0 items/bytes after ~10min, API requests using this index already returned the correct number of items
Upvotes: 0
Reputation: 11
I've been struggling to find some concrete numbers for a similar question.
going off this post from AWS it seems to be recommended that we calculate our provisioned capacity based around a couple factors and set that as the provisioned wcu for the GSI when creating it.
i've however not been able to use the calculated number to get the desired creation time from the formula.
for example,
using a small table on our dev environment that is 24MB in size, 43,032 items, all with an average size less than 1 KB. According to the post allocating roughly 500 WCU for the GSI index should have the duration be 90 seconds. In practice, I've consistently been getting a duration of around 6 minutes, which is an effective WCU of around 120 which is a lot lower than the provisioned 500.
Upvotes: 1
Reputation: 19783
It could take a really long time yes. You'll be writing 1 item per second, so if you have 1 Billion items it will take 1 Billion seconds to create.
Upvotes: 2