Ignatz Buttergood
Ignatz Buttergood

Reputation: 1

Creating a list of partition keys in a single-table database

I am new to dynamoDB and non-SQL databases and am struggling with design.

I'm trying to model an application which departments and department weekly schedules in a school. Ultimately, I'd like to provide a front-end where a user could perform CRUD operations on departments -- initially it's fine if the department has only a name. Department schedules are represented by an HTML fragment that could be embedded in a web page. Department schedules change for different times of the year and also change on weeks with holidays. For a given department I'd like to be able to store templates which define the most common weekly schedule, something along the lines of "2021 Summer Regular Hours". These weekly templates could then be copied for a department for a week's actual schedule, say "2021 Hours May 20210509".

The data accesses my application needs are (listed from most common to least):

  1. Get weekly schedule for department for week of
  2. CRUD weekly schedule for department.
  3. CRUD weekly template for department.
  4. List departments.
  5. CRUD department.

I've been fiddling with a local dynamoDB database with a single table. I've ended up with this:

PK           SK                                  Attributes
============ ===============                     ==========
DEPT#Library PROFILE#Library                     { "departmentName": "Library", ... }
DEPT#Library TEMPLATE#2021 Summer Regular Hours  { "departmentName": "Library",
                                                   "templateName": "2021 Summer Regular Hours",
                                                   "templateDisplay": "<table><tr><td>Sunday</td><td>1:00pm-5:00pm</td></tr>...</table>"}
DEPT#Library SCHEDULE#2021-05-09                 { "departmentName": "Library",
                                                   "scheduleName": "2021-05-09",
                                                   "scheduleDisplay": "<table>...</table>"}

I was going to name my partition key and sort key as "PK" and "SK" after an example I saw of designing a single table to hold a variety of information. I'm not sure if this is good form or not.

I think that this could work well for most of my data access patterns. But I've been having a problem coming up with a good solution to list departments. What I've come up with is to define a global secondary index, departmentNameIndex, with AttributeName of departmentName, KeyType of HASH, and ProjectionType of KEYS_ONLY. This does indeed allow me to scan the table via the index and retrieve all of the department names. Is it bad to do a "scan" on a table in dynamoDB? Is there a simpler way that I could achieve the same end?

Thanks.

Upvotes: 0

Views: 105

Answers (1)

Seth Geoghegan
Seth Geoghegan

Reputation: 5747

You are doing a great job with your first NoSQL data modeling exercise, well done!

I was going to name my partition key and sort key as "PK" and "SK" ... I'm not sure if this is good form or not.

Not only is this good form, I'd argue it's a best practice.

I've been having a problem coming up with a good solution to list departments. What I've come up with is to define a global secondary index, departmentNameIndex, with AttributeName of departmentName, KeyType of HASH, and ProjectionType of KEYS_ONLY. This does indeed allow me to scan the table via the index and retrieve all of the department names.

This is a perfectly reasonable approach. The pattern you are describing is known as a "sparse index" because not every item in your table will have a departmentName. As a result, your secondary index is a subset of the data in the main table (as opposed to every item in your table being replicated in the secondary index).

Is it bad to do a "scan" on a table in dynamoDB?

Scan by itself is neither good nor bad. It's a powerful tool that can be used to do things getItem and query cannot. Like any tool, it can be dangerous in the wrong hands. I'd argue that scan should be used sparingly and only if you are certain you understand what you are doing.

There are a few use cases where using scan is perfectly reasonable. A sparse index is one such case. Another decent use case is when you have infrequently run operations (e.g. weekly metrics gathering/quarterly reports/etc) that don't justify the additional data modeling "cost" to work into your data model.

Is there a simpler way that I could achieve the same end?

DynamoDB gives you a ton of flexibility in how you define your data model. You'll likely go through several iterations before you settle on a particular data model; it's a process. You're on the right path, keep going!

Upvotes: 2

Related Questions