Reputation: 113
I wonder if someone could take a minute out of their day to give some suggestion on my database structure design.
I have sensor data (e.g temperature, humidity ...) with time series format (10Hz) which are installed in different floors of different houses of different cities. So let say something like this:
*City Paris-->House A -->Floor 1 --> Sensor Humidity & temp --> csv file with time series for hours, days, years
City Paris-->House B -->Floor 3 --> Sensor Humidity --> csv file with time series for hours, days, years*
So now I would like to answer these questions: 1- What would be the most efficient method to store the data A sql database? 2- Would it make sense to have timestamp data stored in sql database but the sensor data in CSV file and then link them them to sql database? 3- What about the scalability and possibility to add new sensors?
Many thanks for your help and suggestion in advance,
Upvotes: 1
Views: 1718
Reputation: 2619
At least you should not save the csv in the database as a varchar or text at once. You should break down eveything in as small parts as possible. My suggestion is you first create a table like this
CREATE TABLE measurements (measurement_id INT PRIMARY KEY, floor_id INT, type VARCHAR(50), value FLOAT)
Then you create a table for floors
CREATE TABLE floors (floor_id INT PRIMARY KEY, building_id INT, floor_name INT)
And at least the connection to the building
CREATE TABLE buildings (building_id INT PRIMARY KEY, building_name VARCHAR(200), building_city VARCHAR(200))
You should create foreign keys from the floors.floor_id
to measurements.floor_id
and the buildings.building_id
to floor.building _id
.
You can even break down into more tables to have cities and/or addresses in own once if you like.
Upvotes: 2
Reputation: 81
If your objective is to run time-series analytics, I would recommend to break down your data so that each reading is in one row and to use a time-series database.
The schema proposed earlier is good. But I personally find storing the data in 3 tables too complex as you need to write / check constraints across 3 different tables, and most of your queries will require JOIN clauses.
There are ways to make this schema simpler, for example by leveraging the symbol
type in QuestDB. Symbol stores repetitive strings as a map of integers. On the surface, you are manipulating strings, but the storage cost and operation complexity is that of an int.
This means you can store all your data in a single, more simple table, with no performance or storage penalty. And this would simplify both ingestion as you write into only one table, and queries by removing the need to perform multiple joins.
Here is what the DDL would look like.
CREATE TABLE measurements (
id INT,
ts TIMESTAMP,
sensor_name SYMBOL,
floor_name SYMBOL,
building_name SYMBOL,
building_city SYMBOL,
type SYMBOL,
value DOUBLE
) timestamp (ts)
If you want to add more sensors or buildings, all you need to do is write to the same table.
Upvotes: 2