Reputation: 357
I'm new at DynamoDB technologies but not at NoSQL (I've already done some project using Firebase).
Read that a DynamoDB best practice is one table per application I've been having a hard time on how to design my 1 to N relationship.
I have this entity (pseudo-json):
{
machineId: 'HASH_ID'
machineConfig: /* a lot of fields */
}
A machineConfig
is unique for each machine and can change rarely and only by an administration (no consistency issue here).
The issue is that I have to manage a log of data from the sensors of each machine. The log is described as:
{
machineId: 'HASH_ID',
sensorsData: [
/* Huge list of: */
{ timestamp: ..., data: /* lot of fields */ },
...
]
}
I want to keep my machineConfig
in one place. Log list can't be insert into the machine entity because it's a continuous stream of data taken over time.
Furthermore, I don't understand which could be the composite key, the partition key obviously is the machineId
, but what about the order key?
How to design this relationship taking into account the potential dimensions of data?
Upvotes: 1
Views: 265
Reputation: 3035
You could do this with 1 table. The primary key could be (machineId, sortKey)
where machineId
is the partition key and sortKey
is a string attribute that is going to be used to cover the 2 cases. You could probably come up with a better name.
To store the machineConfig
you would insert an item with primary key (machineId, "CONFIG")
. The sortKey
attribute would have the constant value CONFIG
.
To store the sensorsData
you could use the timestamp
as the sortKey
value. You would insert a new item for each piece of sensor data. You would store the timestamp
as a string (as time since the epoch, ISO8601, etc)
Then to query everything about a machine you would run a Dynamo query specifying just the machineId
partition key - this would return many items including the machineConfig
and the sensor data.
To query just the machineConfig
you would run a Dynamo query specifying the machineId
partition key and the constant CONFIG
as the sortKey
value
To query the sensor data you could specify an exact timestamp or a timestamp range for the sortKey
. If you need to query the sensor data by other values then this design might not work as well.
Editing to answer follow up question:
You would have to resort to a scan with a filter to return all machines with their machineId
and machineConfig
. If you end up inserting a lot of sensor data then this will be a very expensive operation to perform as Dynamo will look at every item in the table. If you need to do this you have a couple of options.
If there are not a lot of machines you could insert an item with a primary key like ("MACHINES", "ALL")
and a list of all the machineIds
. You would query on that key to get the list of machineIds
, then you would do a bunch of queries (or a batch get) to retrieve all the related machineConfigs
. However since the max Dynamo item size is 400KB you might not be able to fit them all.
If there are too many machines to fit in one item you could alter the above approach a bit and have ("MACHINES", $machineIdSubstring)
as a primary key and store chunks of machineIds
under each sort key. For example, all machineIds
that start with 0 go in ("MACHINES", "0")
. Then you would query by each primary key 0-9, build a list of all machineIds
and query each machine as above.
Alternatively, you don't have to put everything in 1 table - it is just a guideline that fits a lot of use cases. If there are too many machines to fit in less than 400KB but there aren't tens of thousands and you aren't trying to query all of them all the time, you could have a separate table of machineId
and machineConfig
that you resort to scanning when necessary.
Upvotes: 2