nkr
nkr

Reputation: 3058

Fewer rows versus fewer columns

I am currently modeling a table schema for PostgreSQL that has a lot of columns and is intended to hold a lot of rows. I don't know if it is faster to have more columns or to split the data into more rows.

The schema looks like this (shortened):

CREATE TABLE child_table (
  PRIMARY KEY(id, position),
  id bigint REFERENCES parent_table(id) ON DELETE CASCADE,
  position integer,
  account_id bigint REFERENCES accounts(account_id) ON DELETE CASCADE,
  attribute_1 integer,
  attribute_2 integer,
  attribute_3 integer,
  -- about 60 more columns
);

Exactly 10 rows of child_table are at maximum related to one row of parent_table. The order is given by the value in position which ranges from 1 to 10. parent_table is intended to hold 650 million rows. With this schema I would end up with 6.5 billion rows in child_table.

Is it smart to do this? Or is it better to model it this way so that I only have 650 million rows:

CREATE TABLE child_table (
  PRIMARY KEY(id),
  id bigint,
  parent_id bigint REFERENCES other_table(id) ON DELETE CASCADE,
  account_id_1 bigint REFERENCES accounts(account_id) ON DELETE CASCADE,
  attribute_1_1 integer,
  attribute_1_2 integer,
  attribute_1_3 integer,
  account_id_2 bigint REFERENCES accounts(account_id) ON DELETE CASCADE,
  attribute_2_1 integer,
  attribute_2_2 integer,
  attribute_2_3 integer,
  -- [...]
);

Upvotes: 3

Views: 183

Answers (1)

Schwern
Schwern

Reputation: 164919

The number of columns and rows matters less than how well they are indexed. Indexes drastically reduce the number of rows which need to be searched. In a well-indexed table, the total number of rows is irrelevant. If you try to smash 10 rows into one row you'll make indexing much harder. It will also make writing efficient queries which use those indexes harder.

Postgres has many different types of indexes to cover many different types of data and searches. You can even write your own (though that shouldn't be necessary).


Exactly 10 rows of child_table are at maximum related to one row of parent_table.

Avoid encoding business logic in your schema. Business logic changes all the time, especially arbitrary numbers like 10.


One thing you might consider is reducing the number of attribute columns, 60 is a lot, especially if they are actually named attribute_1, attribute_2, etc. Instead, if your attributes are not well defined, store them as a single JSON column with keys and values. Postgres' JSON operations are very efficient (provided you use the jsonb type) and provide a nice middle ground between a key/value store and a relational database.

Similarly, if any sets of attributes are simple lists (like address1, address2, address3), you can also consider using Postgres arrays.


I can't give better advice than this without specifics.

Upvotes: 2

Related Questions