Reputation: 675

PostgreSQL - What should I put inside a JSON column

The data I want to store data that has this characteristics:

There are a finite number of fields (I don't expect to add new fields);
There are some columns that are common to all sets of data (a category field, for instance);
There are some columns that are specific to individual sets of data (each category needs it's own fields);

Here's how it would look like in a regular table:

I'm having trouble figuring out which would be the better way to store this data in a database for this situation.

Bellow are the ideas I already had:

Do exactly as the tabular table (I would have many NULL values);
Divide the categories into tables (I would use joins when needed);
Use JSON type for storing the values (no NULL values and having it all in same table).

So my questions are:

Is one of these solutions (or one that I have not thought about it) that is better for this case?
Are there other factors, other than the ones presented here, that I should consider to make this decision?

Upvotes: 2

Answers (3)

Laurenz Albe

Reputation: 248030

Unless you have very many columns (~ 100), it is usually better to use normal columns. NULL values don't take any storage space in PostgreSQL.

On the other hand, if you have queries that can use any of these columns in the WHERE condition, and you compare with =, a single GIN index on a jsonb might be better than having many B-tree indexes, because the index maintenance costs would be higher.

The definitive answer depends on the SQL statements that you plan to run on that table.

Upvotes: 2

Stephan Stieger

Reputation: 1

Depends on how much data you want to store, but as long as it is finite it shouldn't make a big difference if it contains a lot of null's or not

Upvotes: 0

Gordon Linoff

Reputation: 1271051

You have laid out the three options pretty well. Things to consider are:

Performance
Data size
Each of maintenance
Flexibility
Security

Note that you don't even allude to security considerations. But security at the table level is usually a tad simpler than at the column level and might be important for regulated data such as PII (personally identifiable information).

The primary strength of the JSON solution is flexibility. It is easy to add new columns. But you don't need that. JSON has a cost in data size and data type flexibility (notably JSON doesn't support date/times explicitly).

A multiple table solution requires duplicating the primary key but may result in much less storage overall if the columns really are sparse. The "may" may also depend on the data type. A NULL string for instance occupies less space than a NULL float in a table record.

The joins on multiple tables will be 1-1 on primary keys. These should be pretty fast.

What would I do? Unless the answer is obvious, I would dump the data into a single table with a bunch of columns. If that table starts to get unwieldy, then I would think about splitting it into separate tables -- but still have one table for the common columns. The details of one or multiple tables can be hidden behind a view.

Upvotes: 1