Reputation: 454
I'm making a website in which users pull images and add annotation to them and I'm wrestling with the most efficient way to structure the table. Consider the following:
I'm guessing img id and user id as partition and sort key are the best choices, although that leaves 1000 items per user and when new image are added, I would need to add an item for every user - which I could probably do easily enough with a secondary index. I'd like to avoid scans entirely, if possible.
Upvotes: 0
Views: 95
Reputation: 2027
If you want a single table, you might consider two types of items in this table:
Partition key: imgID_xxx
Range key: img
Partition key: userID_xxx
Range key: imgID_xxx
Annotation: some annotation...
So initially you'll only have your 1000 unannotated image items, which users can query via the GSI (hashKey
is what I'm calling the partition key here):
hashKey | rangeKey | isImg | ...
img_0001 | img | 1 |
img_0002 | img | 1 |
...
img_1000 | img | 1 |
When any user downloads any image they'll get this common one to start with, the "Annotated image by user" items are only generated lazily after a user annotates an image.
If a user wants to annotate an image, you will need to write to the "Annotated image" item, which will be partitioned by userID but should also have a GSI on the imgID
.
For example if user_111
annotated two images (img_0002
and img_0042
) and then user_222
annotated just one image (img_0002
):
hashKey | rangeKey | isImg | annotation | imgID |
img_0001 | img | 1 |
img_0002 | img | 1 |
...
img_1000 | img | 1 |
user_111 | img_0002 | | "foo" | img_0002 |
user_111 | img_0042 | | "bar" | img_0042 |
user_222 | img_0002 | | "baz" | img_0002 |
This will allow a user to:
img_0042
, or two items for img_0002
.When adding a new image, only a single item would need to be added. Only once a user annotates that image will you need to create the extra item for that user as well.
Upvotes: 3