Reputation: 139

Conditionally linking Postgres rows to data in various other tables

I have a product table that is updated using CSV feeds from various suppliers. Each feed has its own table, however products can appear multiple times in the same supplier table, and in multiple supplier tables. Each product can only occur once in our main table though. I don't anticipate ever using more than about ten different supplier tables. Tables are updated at least daily, and at most every 6-8 hours, and read speeds are a much higher priority than write speeds. There are usually about 500,000 enabled products at any given time.

My first plan was to store the table name and primary key ID in that table for each product, then recalculate it during each update, but according to the responses here, having to do that is an indication that the database isn't designed correctly.

Using a view to combine these tables into a single virtual table seems like it'd help a lot with the organization. That way, I can just create a rule to make one column an SQL query, then index that column to increase search/read speed. The rules that determine where to pull supplier information from are not somewhat involved, and need to take country and price into account, as well as perhaps a few other things.

So I guess the question here is, is there a correct way of doing this? Or is it going to be messy no matter how I do it? Also, am I on the right track?

Upvotes: 0

Answers (2)

Jay

Reputation: 27492

Hmm, why not just create one product table that contains data from all suppliers? Have a field in that table that identifies which supplier. When you get your input feeds, update this one table rather than having a separate table for each supplier. If you're using COPY to import a CSV file into a db table, fine, but then the imported table is just a temporary work table. Promptly copy the data from there into the "real", unified table. Then the import table can be dropped or truncated, or more likely you keep it around for troubleshooting. But you don't use it within the program.

You should be able to copy from the import table to the unified table with a single insert statement. Even if the tables are large I'd expect that to be fast. It would almost surely be faster overall to do one mass insert for each import than to have a view that does a union on 10 tables and try to work with that. If the unified table has all the data from all suppliers plus a supplier field, then I don't see why you would ever need to query the raw import tables. Except, that is, for trouble-shooting problems with the import, but fine, so you keep them around for that. Unless you're constrained on disk space so that keeping what amounts to duplicates of every record is a problem, I'd think this would be the easy solution. If disk space is an issue, than drop the import table immediately after copying the data to the unified table, and keep the original raw import on backup media somewhere.

Upvotes: 1

John Bollinger

Reputation: 181149

Using a view unifying all your feed tables might well simplify the form of your queries, but you cannot index a view. (Well, in Oracle I think you can index a MATERIALIZED view, but that's a special case).

Structurally, I find it a bit suspect that you split your supplier feeds into separate tables; doing so may simplify and speed updates from the supplier feeds, and it is certainly the fastest alternative for queries against specific, individual feeds, but it's ugly for updating (recomputing?) the main table, and it is flatly unsound for supporting rows of the main table being related back to the particular supplier feed from which they were drawn.

If you need fast queries against the supplier feeds, independent of the main table, and you also need the main table to be related to a detail table containing supplier-specific information, then perhaps your best bet would be to maintain a physical auxiliary table as the UNION ALL of all the per-supplier tables (this requires those tables to have the same structure), each with a distinct supplier ID. In Oracle, you can automate that as a MATERIALIZED VIEW, but with most DBMSs you would need to maintain that table manually.

The auxilliary table can be indexed, can be joined to the main table as needed in queries, and can be queried fairly efficiently. If appropriate, it can be used to update the main table.

Upvotes: 1

Conditionally linking Postgres rows to data in various other tables

Answers (2)

Related Questions