Mulitple Tables vs Tables Indexes in Postgres

Question

I'm a front end developer with little SQL experience. I am considering developing a data query system for an organization I work for.

Currently much of the data is in a series of spreadsheets. I have close to 100 worksheets (ie tables) derived from the same template (including a column for country) but with different values based on the worksheets' planning scenario (eg "efficient") and economic sector (eg "agriculture"). Each worksheet has about 8000 rows.

Do I create separate database tables for each of these worksheets - even through the tables will have identical CREATE statements? In this case I imagine I would create an index along these lines:

CREATE INDEX sector_scenario_lower_country ON sector_scenario(lower(country));

I would have to create this index 100 times (once for each sector_scenario table). When I want to find the data rows I'm looking for, I'd have to use my app to identify the correct table (this really shouldn't be much trouble or take much time) and then make the query:

SELECT col4, col5, col6 FROM sector_scenario WHERE lower(country) = "brazil";

Or should I add columns for scenario and sector to the database table and then copy all of the worksheets to that single table?

In this case, I'd create the following index just once:

 CREATE INDEX main_table_idx ON  main_table(scenario, sector, lower(country));

I'd then make the following query quite regularly:

SELECT col4, col5, col6 FROM main_table WHERE scenario = "efficient" AND sector = "agriculture" AND lower(country) = "brazil";

Obviously the second option will be much less work to set up. However, can I expect comparable performance from it?

Gordon Linoff · Accepted Answer

The second solution is the right solution -- putting all the rows into one table and building an index for that one table.

You would separate the data into different tables only under very rare circumstances. The only one that I can think of is a user requirement that their data be stored separately from any one else's data.

One question is whether the overall size of the indexes in the first scenario would be comparable to the size in the second. Given that the indexes in the first scenario would have -- on average -- half an empty page (at the end), I suspect they might be larger. The additional overhead of storing the scenario would only occur once per value. Without actually testing the size, I think data size would favor the single-table approach.

There are other possibilities where working with a large amount of data in each table could cause the table or index to overflow available memory. If this is an issue, then breaking up the table is a good thing. The right approach, though, is to use partitioning to break each segment into a separate table -- not to manage a bunch of tables independently.

Mulitple Tables vs Tables Indexes in Postgres

Answers (2)

Related Questions