Reputation: 590
I am setting up a database which has hourly tables. Each table is created in a single copy statement. The purpose of the tables is to hold an indexed summary of the data held in standard files that compress really well. For the record these files hold Internet flow data.
Sometimes we need to search over a months files looking for specific addresses and working through the files one at a time takes a long time so the intention is to build an index of each hour file which lists that IP Addresses for that hour. To search we go serially through the time range we are interested in and do a quick search on each hour to see if we need to unpack and search the corresponding file on disk.
I am using Postgres since it supports IP addresses natively. So for each hour I create a table with a name related to the corresponding file on disk (date and time, surprise). I need indexes for the source and destination addresses but should I create those indexes before or after I load the data using a copy statement.
I guess another way of putting it is whether it is faster to do the indexing when the data is being loaded or in one go after it is all done.
Once a table is loaded it will normally never be modified until it is deleted.
Upvotes: 0
Views: 506
Reputation: 2205
What version of Postgres are you on?
Have you considered partitioning?
And, yes, it ought to be faster to build the index at the end, but that leaves out a lot of nuance.
Upvotes: 1