Reputation: 919
I have IoT system where devices send data that doesn't strictly adhere to any specific schema, but there is always "telemetry" object.
{
"deviceId": "SimulatedDevice",
"telemetry":
{
"temperature": 23.34,
"status":
{
"health": "ok"
}
}
}
Telemetry data gets flattened into simple key-value dictionary.
{
"temperature": 23.34,
"status.health": "ok"
}
The key is always string. The value is always string, numeric, boolean or null.
Question
What would be the best approach to store this sort of data?
Option 1
Use JSONB/JSON for value. You can store anything in value as long as it is valid JSON.
CREATE TABLE telemetry
(
time TIMESTAMPTZ NOT NULL,
deviceId TEXT NOT NULL,
key TEXT NOT NULL,
value JSONB
)
Option 2
Create field for every possible value type. If value is Numeric, I insert it into value_numeric column and make other columns NULL. If the value is NULL, then all columns are NULL.
CREATE TABLE telemetry
(
time TIMESTAMPTZ NOT NULL,
deviceId TEXT NOT NULL,
key TEXT NOT NULL,
value_string TEXT,
value_numeric NUMERIC,
value_boolean BOOLEAN
)
IoT data can grow very large, potentially hundreds of millions of rows. Which option, in long run, would yield the best performance?
-- Option 1
SELECT * FROM telemetry
WHERE time >= '...' AND time < '...' AND deviceId = '...' AND value > '75'
-- Option 2
SELECT * FROM telemetry
WHERE time >= '...' AND time < '...' AND deviceId = '...' AND value_numeric > 75
I imagine using JSONB/JSON column (option 1) adds more overhead, but maybe it is very negligible? Are there any general guidelines related to what I am trying to achieve?
Upvotes: 1
Views: 153
Reputation: 6104
I wrote a best practices blog post about it a while ago. It explains the two formats you presented (narrow and medium, as well as wide), and gives pros and cons accordingly.
I think you might find that interesting: https://www.timescale.com/blog/best-practices-for-time-series-data-modeling-narrow-medium-or-wide-table-layout-2/
Pure JSONB may bite you eventually due to the additional overhead of parsing / extracting the value, and when you add compression into the game, you need to make sure that all deciding factors (when selecting) are indexed, so you only decompress / read the JSONB column when really necessary.
I hope this helps :)
Upvotes: 2