When should I store my data as JSON?

Question

This is one of the longer ones so buckle up :P!

Consider a users table. It has the ID, password, email, and some timestamps. Now, say I want to store more information, such as where they are based (or any other not-too-often retrieved piece of data). I can store it in a new column in the users table as JSON (pseudo-code)

{based_in: {address: "here", country:"xx", region: "great"}}

Or I could create a 1:1 related user_location table that stores that information.

CREATE TABLE `user_based_in` (
  `id` int(11) unsigned NOT NULL AUTO_INCREMENT,
  `user_id` int(11) DEFAULT NULL,
  `address` int(11) DEFAULT NULL,
  `country` int(11) DEFAULT NULL,
  `region` int(11) DEFAULT NULL,
  PRIMARY KEY (`id`)
)

Lastly, I could also create a 1:n related table (user_info) that can hold this data in a key-value format, e.g. based_in_address, based_in_country.

CREATE TABLE `user_info` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `user_id` int(10) unsigned NOT NULL,
  `key` varchar(255) COLLATE utf8mb4_unicode_ci NOT NULL,
  `value` text COLLATE utf8mb4_unicode_ci NOT NULL,
  PRIMARY KEY (`id`),
  KEY `user_meta_key_index` (`key`)
)

With the first approach, I can add new details about the user without having to modify the back-end - so that's cool. But I feel it gets messy fast as I add more details.

With the second approach, I guess I get better indexing but that means having to tweak the back end every time I want to store a new user detail.

With the third approach, I can also add new details (keys) without changing the back end - but this solution seems kinda ugly to me in that it'll generate a lot of rows and can't be related (in the example above, only the first two options make it clear that based_in_address and based_in_country belong to the same object based_in).

I could also do a hybrid approach combining both the first and third method; creating a 1:n table that stores JSON in its value column.

And then, lets say I'd also like to add their favourite colour :P - would it be a new table, new column, new row, or a new object?.

What approach would work best for not-so-often-retrieved related data (that is often subject to change as in new keys may be added to store other details) in terms of performance, scalability, and flexibility?

When do we create related tables rather than creating new columns for 1:1 relationships (especially since the advent of JSON in MySQL)?

Ultimately, when should I store my data as JSON?

P.S. While I understand NoSQL might be able to solve this issue, I'm looking for a MySQL answer!

Schwern · Accepted Answer

Ultimately, when should I store my data as JSON?

JSON types are there for when your data might not fit into a neat system of typed columns. You should use it sparingly when you're not sure what data you want to put in the table. Or when the data is inherently unstructured, this is infinitely more flexible than a key/value table.

In your example, you're pretty sure what you want to store: location information. That's a pretty well defined problem, and it is well handled by your user_based_in table.

A good example of when to use JSON might be logging hits to API endpoints. The parameters are decided on by each API endpoint, something the table cannot know.

create table api_log (
    id integer primary key auto_increment,
    endpoint text not null,
    when timestamp not null,
    params json not null
);

A Practical Guide to MySQL JSON Data Type By Example uses a similar scenario.

Another example might be when you're just not sure what additional data you might want to collect.

create table stuff (
    ...normal table design...
    other json
);

This allows the application to speculatively store additional data, play around with it, and see if it's something they want to add to the structure. Once you do decide a given key/value is important, add it to the table as a column and transfer the data from JSON to the new column. This affords you all the benefits of a traditional relational database.

Let's say we decide things are important. We make a new things column. Set it to other->>'$.things'. And remove things from the JSON column to avoid data redundancy.

alter table stuff add column things text;

update stuff
set things = other->>'$.things',
    other = json_remove(other, '$.things')
where other is not null

When should I store my data as JSON?

Answers (1)

Related Questions