Reputation: 565
I'm looking at using PostgreSQL's jsonb
column type for a new backend project that will mainly serve as an REST-ful JSON API. I believe that PostgreSQL's jsonb
will be a good fit for this project as it will give me JSON objects without need for conversion on the backend.
However, I have read that the jsonb
data type slows down as keys are added, and my schema will have need of using primary keys and foreign key references.
I was wondering if having primary keys/foreign keys in their own columns (in the standard relational database way) and then having a jsonb
column for the rest of the data would be beneficial, or would this cause problems (whether now or down the road)?
In short, would:
table car(id int, manufacturer_id int, data jsonb)
perform better or worse than:
table car(data jsonb)
Especially when looking up foreign keys frequently?
Would there be downsides to the first one, from a performance or a schema perspective?
Upvotes: 10
Views: 8674
Reputation: 656714
All values involved in a PRIMARY KEY
or FOREIGN KEY
constraint must be stored as separate columns (best in normalized form). Constraints and references do not work for values nested inside a json
/ jsonb
column.
As for the rest of the data: it depends. Having them inside a jsonb
(or json
) value carries the well-known advantages and disadvantages of storing unstructured document-type data.
For attributes that are present for all or most rows, it is typically better (faster, cleaner, smaller storage) to store them as separate columns. Especially simpler and cheaper to update. Easier indexing and other queries, too. The new jsonb
has amazing index capabilities, but indexing dedicated columns is still simpler / faster.
For rarely used or dynamically appearing attributes, or if you want to store and retrieve JSON values without much handling inside the DB, look to jsonb
.
For basic EAV structures with mainly character data, without nesting and no connection to JSON I would consider hstore
. There are also the xml
(more complex and verbose) and json
data types (mostly superseded by jsonb
), which are losing ground.
Upvotes: 16
Reputation: 45795
Which perform better? Depends on usage. It is same question, when you compare SQL (relational) and NoSQL (KeyValue or Document) databases. For some use cases a NoSQL databases performs very well, for other not.
Relational concept (normalized schema) is optimized for typical OLTP usage - 70% read/30% write, multiuser, lot of updates, report calculating, some ad hoc queries. Relational concept is relatively wide general .. with very wide usability (evidence, accounting, processing support, ...). Usually it is not too bad everywhere.
It is clear, so specialized databases (Document, KeyValue, Graph) can be significantly better (one order faster) on specialized use cases. But their usage is significantly narrower. When you are out of optimized use case, then performance can be bad.
Other question is database size - record numbers. The difference in performance on production databases can be significant in hundred thousand rows. For some smaller databases the impact can be not significant.
Postgres is relational database and my preference is to use a normalized schema for all important data in database. When you use it well, it is terrible fast. Non relation types is perfect for some fuzzy data (HStore, JSON, XML, Jsonb) - it is significantly better than EAV schema (perform worse on bigger data).
If you need do some important decision, prepare prototype, fill it for expected data (3 years) and check a speed of some important queries for your system. Attention: strong impact on these benchmarks has used hw, current load, current sw.
Upvotes: 4