Reputation: 6242
I am having trouble understanding what the difference is between a global secondary index and a table.
Upvotes: 6
Views: 2957
Reputation: 47249
I'll take a stab at this.
One thing is that you get an eventually consistent view of the data, and it can also act as a sort of "transactional" model.
Imagine that you want to track user/group relationships. This might not be the best example, but I think it will demonstrate a few points.
Let's say your use cases are you want to be able to Query
all groups for a user, and Query
all users for a group. In this simple setup, you would think of having 2 tables:
UsersToGroups
with hash+range of userId
+groupId
GroupsToUsers
with hash+range of groupId
+userId
.If you need to make an update to any relationship a client needs to:
UsersToGroups
table (hash: userId
, range: groupId
)GroupsToUsers
table (hash: groupId
, range: userId
)What happens if your 2nd write fails? How do you rollback the first write if the second fails? How do you know your 2nd write fails, say if a connection failure happens?
These problems are not fun to deal with.
With a GSI, you could have a single table, depending on how you want to manage it. If instead of using 2 tables, let's say I use a single table and a single GSI.
UsersToGroups
with hash+range of userId
+`groupIdGroupsToUsers
with hash+range of groupId
+userId
.If you need to make an update to any relationship a client needs to:
UsersToGroups
That is it. You only have to make 1 request. If that write is successful you can guarantee that your index will (eventually) have the same data. Depending on how often you query this index, or how much data you need to propagated, you can adjust the throughput accordingly.
This simple example assumes that userId
s and groupId
are unique and no collisions will happen when they are projected to the index, but I think it does a good job of explaining at least some usefulness
For more information, see the Guidelines for Global Secondary Indexes documentation.
Upvotes: 8
Reputation: 2733
Let's break your question in parts.
1. what is the difference between a global secondary index and a table.
Table : In dynamo db table is just a storage facility of data, unlike rdbms it doesn't have to maintain any type of constraints and relationships with another tables.
GSI : its a feature provided by dynamo db which helps in retrieving data from a table in a much faster manner.
2. Why would I use a global secondary index, why not just create another table.
As dynamo db is a NoSql db, we can not do queries the same way we do with traditional RDBMS like oracle. To serve the purpose of querying we need to have indexes on those attribute of table based on which we want to query. If we do not make the indexes be it GSI or LSI and we want to extract some information we will have to scan the whole table.
if we create another table then also we will need to query that table at some point of time.
3. When a write occurs on a table with gsi I have to write to both the table and the index.
Though i am not sure, but we can safely assume that the data doesn't get written separately for the table and the index in the raw format. They must have done some kind of optimization inside their db. so the cost involved is not exactly similar when compared to the plain db write.
also we, the developer, do not need to write to write in both GSI and table it is managed by dynamo db itself, we just need to write in table.
4. What benefit do I get by using a gsi?
a) GSI keeps the index loosely coupled with table unlike LSI. we can create/delete them separately when required. so it is better than LSI
b) As it provides hash and range combination then the queries can be done in more optimal manner.
c) when compared to full scan of the table ( which in absence of indexes is unavoidable), it is way more faster and less costly.
hope it helps :)
Upvotes: 2
Reputation: 154
When a write occurs on a table with a GSI I have to write to both the table and the index. My question then is why not GSI create another table instead of a global secondary index?
No, you don't need to write to both the table and GSI. DynamoDB automatically maintains the index for you. I.e. when you write to the table, GSI will be automatically updated.
What benefit do I get by using a GSI?
You will get ability to "query" the data by the GSI key.
Very detailed explanation with plenty of examples is right there http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/GSI.html
Upvotes: 1