Rice Junkie
Rice Junkie

Reputation: 186

New to database design (Postgres) seeking advice

I am new to designing databases. I come from a front end background. I am looking to design a database that stores performance metrics of wind turbines. I was given an excel file. The list of metrics is nearly 200 and you can see the first few in this image. enter image description here

I can't think of the best way to represent this data in a database. My first thought was to import this table as is into a database table and add a turbine-id column to it. my second through was to create a table for each metric and add a turbine-id column to each of those tables. What do you guys think. What is the best way for me to store the data that would set me up with a performant database. Thank you for your help and input

Upvotes: 0

Views: 75

Answers (2)

James K. Lowden
James K. Lowden

Reputation: 7837

You want one table to represent turbines (things true of the turbine, like its location) and one or more of turbine metrics that arrive over time. If different groups of metrics arrive at different intervals, put them in different tables.

One goal I would have would be to minimize the number of nullable columns. Ideally, every column is defined NOT NULL, and invalid inputs are set aside for review. What is and is not nullable is controlled by promises made by the system supplying the metrics.

That's how it's done: every table has one or more keys that uniquely identify a row, and all non-key columns are information about the entity defined by the row.

It might seem tempting and "more flexible" to use one table of name-value pairs, so you never have to worry about new properties if the feed changes. That would be a mistake, though (a classic one, which is why I mention it). It's actually not more flexible, because changes upstream will require changes downstream, no matter what. Plus, if the upstream changes in ways that aren't detected by the DBMS, they can subtly corrupt your data and results.

By defining as tight a set of rules about the data as possible in SQL, you guard against missing, malformed, and erroneous inputs. Any verification done by the DBMS is verification that the application can skip, and that no application will be caught by.

For example, you're given min/max values for wind speed and so on. These promises can form constraints in the database. If you get negative wind speed, something is wrong. It might be a sensor problem or (more likely) a data alignment error because a new column was introduced or the input was incorrectly parsed. Rather than put the wind direction mistakenly in the wind speed column, the DBMS rejects the input, and someone can look into what went wrong.

Don't forget to have fun. You have an opportunity to create a new database in a growing industry, and learn about database technology and theory at the same time. Doesn't happen every day!

Upvotes: 1

One way to do it would be something like this:

TURBINE
  ID_TURBINE         INTEGER PK
  LATITUDE           DECIMAL
  LONGITUDE          DECIMAL

METRIC
  ID_METRIC          INTEGER PK
  METRIC_NAME        VARCHAR UNIQUE
  VALUE_TYPE         VARCHAR
    Allowed values = ('BOOLEAN', 'PERCENTAGE', 'INTEGER', 'DOUBLE', 'STRING')

TURBINE_METRIC
  ID_TURBINE_METRIC        INTEGER PK
  ID_TURBINE               INTEGER
     FOREIGN KEY TO TURBINE
  METRIC_NAME              VARCHAR
     FOREIGN KEY TO METRIC
  BOOLEAN_VALUE            BOOLEAN
  PERCENTAGE_VALUE         DOUBLE
  INTEGER_VALUE            INTEGER
  DOUBLE_VALUE             DOUBLE
  STRING_VALUE             VARCHAR

Flesh this out however you need it to be. I have no idea how long your VARCHAR fields should be, etc, but this allows you flexibility in terms of which metrics you store for each turbine. I suppose you could make the LATITUDE and LONGITUDE metrics as well - I just added them to the TURBINE table to show that there may be fixed info which is best stored as part of the TURBINE table.

Upvotes: 1

Related Questions