Reputation: 186
I am new to designing databases. I come from a front end background. I am looking to design a database that stores performance metrics of wind turbines. I was given an excel file. The list of metrics is nearly 200 and you can see the first few in this image.
I can't think of the best way to represent this data in a database. My first thought was to import this table as is into a database table and add a turbine-id column to it. my second through was to create a table for each metric and add a turbine-id column to each of those tables. What do you guys think. What is the best way for me to store the data that would set me up with a performant database. Thank you for your help and input
Upvotes: 0
Views: 75
Reputation: 7837
You want one table to represent turbines (things true of the turbine, like its location) and one or more of turbine metrics that arrive over time. If different groups of metrics arrive at different intervals, put them in different tables.
One goal I would have would be to minimize the number of nullable columns. Ideally, every column is defined NOT NULL, and invalid inputs are set aside for review. What is and is not nullable is controlled by promises made by the system supplying the metrics.
That's how it's done: every table has one or more keys that uniquely identify a row, and all non-key columns are information about the entity defined by the row.
It might seem tempting and "more flexible" to use one table of name-value pairs, so you never have to worry about new properties if the feed changes. That would be a mistake, though (a classic one, which is why I mention it). It's actually not more flexible, because changes upstream will require changes downstream, no matter what. Plus, if the upstream changes in ways that aren't detected by the DBMS, they can subtly corrupt your data and results.
By defining as tight a set of rules about the data as possible in SQL, you guard against missing, malformed, and erroneous inputs. Any verification done by the DBMS is verification that the application can skip, and that no application will be caught by.
For example, you're given min/max values for wind speed and so on. These promises can form constraints in the database. If you get negative wind speed, something is wrong. It might be a sensor problem or (more likely) a data alignment error because a new column was introduced or the input was incorrectly parsed. Rather than put the wind direction mistakenly in the wind speed column, the DBMS rejects the input, and someone can look into what went wrong.
Don't forget to have fun. You have an opportunity to create a new database in a growing industry, and learn about database technology and theory at the same time. Doesn't happen every day!
Upvotes: 1
Reputation: 50017
One way to do it would be something like this:
TURBINE
ID_TURBINE INTEGER PK
LATITUDE DECIMAL
LONGITUDE DECIMAL
METRIC
ID_METRIC INTEGER PK
METRIC_NAME VARCHAR UNIQUE
VALUE_TYPE VARCHAR
Allowed values = ('BOOLEAN', 'PERCENTAGE', 'INTEGER', 'DOUBLE', 'STRING')
TURBINE_METRIC
ID_TURBINE_METRIC INTEGER PK
ID_TURBINE INTEGER
FOREIGN KEY TO TURBINE
METRIC_NAME VARCHAR
FOREIGN KEY TO METRIC
BOOLEAN_VALUE BOOLEAN
PERCENTAGE_VALUE DOUBLE
INTEGER_VALUE INTEGER
DOUBLE_VALUE DOUBLE
STRING_VALUE VARCHAR
Flesh this out however you need it to be. I have no idea how long your VARCHAR fields should be, etc, but this allows you flexibility in terms of which metrics you store for each turbine. I suppose you could make the LATITUDE and LONGITUDE metrics as well - I just added them to the TURBINE table to show that there may be fixed info which is best stored as part of the TURBINE table.
Upvotes: 1