Reputation: 795
Technologies used: AWS, Lambda, DynamoDB, Python.
I am not very experienced in DynamoDB/NoSQL and my case is the following:
There will be a lambda running every couple of minutes to get the messages that I need to notify the users about. Lambda "knows what time it is" and wants to get only messages of users who want to get their notifications at this point in time basing on their preferences.
Current DynamodDB table design is the following: user_messages table - Primary Key (Partition Key: user_id, Sort Key: message_id), attributes: message_text, creation_time etc.
My struggle is - how to design the DB in optimal way to limit number of RCU's consumed and compute time on lambda when extracting those messages. It would be simpler if I'd allow each user to have only one notification time set. I'd just create a notification time attribute and a new GSI where the notification would be the partition key but this would limit the user too much.
I am not sure how to approach it in case of multiple notification times per user, got 2 possible scenarios now:
1. limit the notification setting time to N, for example 3 max per user, store the preferences in 3 attributes and create 3 GSI's, in such case the lambda would query the table 3 times each run - this doesn't look elegant and I am concerned about the hard limit on number of notifications
the table design would look like this in such case: user_messages table - Primary Key (Partition Key: user_id, Sort Key: message_id), attributes: message_text, creation_time etc., GSI_1 (notfication_time_1), GSI_2 (notification_time_2), GSI_3 (notification_time_3)
2. create a separate table with user preferences, like Partition Key: notification_time, attribute: user_id
In such case the lambda would have to get all user_id for a particular notification time and iterate over user_messages_table to get user messages, means if I have 1000 users to notify I'd need to query user_messages_table 1000 times. Doesn't look good from the performance point of view and will consume a lot RCU's.
Actually I am stuck here as none of the above solutions seems optimal for me.
Do you see any other approach I could take here?
Upvotes: 0
Views: 1378
Reputation: 13108
My understanding is that you're gathering messages for each user in a table and depending on the user you want to send these notifications at different points in time.
Update: There are two solutions, I'm having a hard time deciding, but I'd probably go for #2
I'd probably go for a single table design like this:
PK | SK | GSI1PK | GSI1SK | type | attributes |
---|---|---|---|---|---|
U#1 | NC#1 | NCT#08:30 | U#1NC#1 | NOTIFICATION_CONFIGURATION | {time_of_day_in_utc: 08:30} |
U#1 | NC#2 | NCT#17:30 | U#1NC#1 | NOTIFICATION_CONFIGURATION | {time_of_day_in_utc: 17:30} |
U#1 | MSG#2021-02-27...#ID#123 | MESSAGE | {message_id: 123, create_time: 2021-02-27T09:30:00Z, body: bla | ||
U#1 | MSG#2021-02-27...#ID#789 | MESSAGE | {message_id: 789, create_time: 2021-02-27T10:30:00Z, body: blub | ||
U#2 | NC#1 | NCT#10:15 | U#1NC#1 | NOTIFICATION_CONFIGURATION | {time_of_day_in_utc: 10:15} |
U#1 | MSG#2021-02-27...#ID#654 | MESSAGE | {message_id: 654, create_time: 2021-02-27T10:30:00Z, body: test |
PK is the partition key, SK the sort key, GSI1PK and GSISK are the partition and sort keys of a global secondary index GSI1.
Your Lambda function now has to perform the following steps:
Query @ GS1; GSIPK=NCT#<time>
PK=U#<user-id> and SK start_with MSG
This way you can do a KEYS_ONLY
projection for GSI1, which saves on storage and RCU costs.
You'll have to query every user with a notification configuration when you send the message, but the actual RCUs should be fairly limited, it will just be a lot of requests.
You could also extend this design to store historical messages if you keep track of when the last notification was sent out to each user. Then you'd have an additional read for that attribute, but could change step 3 to a between query.
This may be better, although it might also result in hot partition for write loads.
PK | SK | type | attributes |
---|---|---|---|
U#1 | NC#1 | NOTIFICATION_CONFIGURATION | {time_of_day_in_utc: 08:30} |
U#1 | NC#2 | NOTIFICATION_CONFIGURATION | {time_of_day_in_utc: 17:30} |
SM#17:30 | U#1#ID#123 | SCHEDULED_MESSAGE | {message_id: 123, create_time: 2021-02-27T09:30:00Z, body: bla |
SM#17:30 | U#1#ID#789 | SCHEDULED_MESSAGE | {message_id: 789, create_time: 2021-02-27T10:30:00Z, body: blub |
U#2 | NC#1 | NOTIFICATION_CONFIGURATION | {time_of_day_in_utc: 10:15} |
SM#10:15 | U#2#ID#654 | SCHEDULED_MESSAGE | {message_id: 654, create_time: 2021-02-27T10:30:00Z, body: test |
When you add a new message you do the following:
PK=U#<id>, SK starts_with NC
to get all notification configurationsThe lambda that is supposed to send messages can now do this:
PK=SM#<time>
to get all messages that need to be sent out nowThis way sending messages is cheaper, but changes to a notification period are applied with a delay. Or on changes to notifications periods for a user you'd have to update the scheduled messages.
Upvotes: 2