Eric H.
Eric H.

Reputation: 7014

Mulitple Tables vs Tables Indexes in Postgres

I'm a front end developer with little SQL experience. I am considering developing a data query system for an organization I work for.

Currently much of the data is in a series of spreadsheets. I have close to 100 worksheets (ie tables) derived from the same template (including a column for country) but with different values based on the worksheets' planning scenario (eg "efficient") and economic sector (eg "agriculture"). Each worksheet has about 8000 rows.

Do I create separate database tables for each of these worksheets - even through the tables will have identical CREATE statements? In this case I imagine I would create an index along these lines:

CREATE INDEX sector_scenario_lower_country ON sector_scenario(lower(country));

I would have to create this index 100 times (once for each sector_scenario table). When I want to find the data rows I'm looking for, I'd have to use my app to identify the correct table (this really shouldn't be much trouble or take much time) and then make the query:

SELECT col4, col5, col6 FROM sector_scenario WHERE lower(country) = "brazil";

Or should I add columns for scenario and sector to the database table and then copy all of the worksheets to that single table?

In this case, I'd create the following index just once:

 CREATE INDEX main_table_idx ON  main_table(scenario, sector, lower(country));

I'd then make the following query quite regularly:

SELECT col4, col5, col6 FROM main_table WHERE scenario = "efficient" AND sector = "agriculture" AND lower(country) = "brazil";

Obviously the second option will be much less work to set up. However, can I expect comparable performance from it?

Upvotes: 0

Views: 252

Answers (2)

David S
David S

Reputation: 13851

You don't give enough information to give an extremely detailed answer, but I'm not going go on record as saying you most likely need/want 1 table. Performance will be dependent on too many things to list here (hardware, configurations, etc), but PostgreSQL should have no problems with 8M rows. Better performance will be seen, if you properly index it. And to do that, make sure you write your queries in pgAdminIII and use the analyze function. You will need to do some research on how to interpret the results, but you can always post your schema, queries and the queries analysis on SO for people to help you optimize poorly performing queries. I think you will find the Postgres community very helpful and eager to assist with performance problem.

Upvotes: 1

Gordon Linoff
Gordon Linoff

Reputation: 1269693

The second solution is the right solution -- putting all the rows into one table and building an index for that one table.

You would separate the data into different tables only under very rare circumstances. The only one that I can think of is a user requirement that their data be stored separately from any one else's data.

One question is whether the overall size of the indexes in the first scenario would be comparable to the size in the second. Given that the indexes in the first scenario would have -- on average -- half an empty page (at the end), I suspect they might be larger. The additional overhead of storing the scenario would only occur once per value. Without actually testing the size, I think data size would favor the single-table approach.

There are other possibilities where working with a large amount of data in each table could cause the table or index to overflow available memory. If this is an issue, then breaking up the table is a good thing. The right approach, though, is to use partitioning to break each segment into a separate table -- not to manage a bunch of tables independently.

Upvotes: 3

Related Questions