PileOfBirds
PileOfBirds

Reputation: 11

Which is better in a database: one large table with columns that are often NULL, or many different tables?

Context: I work for a company that stores and analyzes data from inertial measuring units (IMU). My teammates and I are considering creating and maintaining a DB for all the data we gather for analysis purposes (which is currently being saved in .csv files).

Each device we gather data from has a slightly different output. It's safe to assume most devices will output the following values: Gx, Gy, Gz, Ax, Ay, Az, Temperature

But some units output 1 or more additional types of data. For example: Gx, Gy, Gz, Ax, Ay, Az, High-G Ax, Temperature

To move away from the .csv files, we'll need to have a table (or multiple tables) to store the measured data. None of us have any experience in creating or maintaining a DB, so we're unsure of what would be the best way to implement it.

One approach could be to create a table for all unit types that includes all common measurement axes and an "Other" column for cases where the unit has a measurement axis that is not specified. The columns would look something like: RunID, Gx, Gy, Gz, Ax, Ay, Az, HighG_Ax, Temperature, Other

In this case I expect the High-G Ax column to often be filled with NULL values, since many of the units used for measurements don't utilize this sensor type. Same goes for the "Other" column.

The biggest issue I can see with this approach is that if we'll come across a unit type that's ENTIRELY different, i.e. has 3+ measurement axes that weren't pre-included in this table, we'd have to retroactively add columns to the table and fill them with NULL for all previous entries.

The other option is to create a different table for each unit type with columns that match the output data. This ensures no NULL values at all, but means we have at the very least 6-7 different tables for measurements where most of the columns are the same.

So I'd have one table that has RunId, Gx, Gy, Gz, Ax, Ay, Az, HighG_Ax, Temperature and at least one other table that has RunID, Gx, Gy, Gz, Ax, Ay, Az, Temperature But I'll also be able to handle many different types of units with different sensors and measurement axes without having to go back and alter tables retroactively.

These tables could get very large very quickly considering the amount of data we gather, so I'm trying to avoid mistakes that would require a complete redesign in the near future.

Which solution is best-practice? What are the possible downsides to consider?

Thank you in advance :)

Upvotes: 1

Views: 67

Answers (0)

Related Questions