Reputation: 31
I have a database of automobile classified listings.
After 90 days, the classified listing is no longer valid to be displayed (the listing expires); however, I want to retain the listing for archive purposes.
Question: From a database design best practice perspective as well as query performance, is it better to keep the old listing A) in the same table as the current listing or B), move the expired listing over to an expired table and delete that listing from the current listing table?
In other words,
Option A):
table_classified_listing:
car_id
expired = true | false
...
Option B):
// only current listing in this table (expired = false)
table_classified_listing:
car_id
...
// only expired listing in this table (expired = true)
expired_table_classified_listing:
car_id
...
UPDATE:
My concern with Option A, is that in my MySQL database - when I run EXPLAIN
, it say that it's using the expired
as the primary key to index on. However, what's more important to my query search performance is for it to use the price
field since I'm doing a search based on the price > X
. Hence why I'm considering to chose option B.
Upvotes: 3
Views: 4049
Reputation: 27474
I would keep them in one table. Otherwise, (a) You have two tables with identical columns. THen any time you make changes to the data, you have to remember to change both tables identically. Sooner or later someone will forget -- or get the bright idea that data in one table is not needed in the other -- and now your design gets more complicated. Pretty soon you're writing exactly the same logic twice: once to retrieve from the "current" table and again to retrieve from the "archive" table. But then someone makes a change to one piece of code and forgets to make the same change to the other piece of code. Then the next person to come along can't be sure if they're different because there's a good reason why they should be different or if someone just forgot. Etc. (b) It seems likely that you will have queries that will want to hit both tables, like "tell me all the ads with asking price over $20,000 in the past 12 months", where some of the ads could be current and others archived. These queries are now unions or complicated joins instead of simply not including the "expired is true" or "expired is false" flag.
As to the performance issue, that's easy: Create a multi-field key that includes whatever you need to include. expired + price or expired + modelname seem likely keys. You probably want to put expired first because most of your queries are probably going to want non-expired records, but I'm just guessing. Choosing what's worth indexing on is a complicated decision, but when there are obvious common queries on multiple fields, just do it.
Upvotes: 1
Reputation: 912
Here is what I understand :
If I understood the above correctly, the next question is how often is your expired data used? and what is it used for? Like @ghills pointed out, sql-unions might slow you down.
If expired data does not need to be online, it might make sense to archive it away to a separate table. Especially if # of expired-rows can exceed the # of active-rows.
If you keep them in the same table, "where expired=false" could end up being your constant companion, and since selectivity will be low (i.e. lots of expired-rows), indexing on "expired" column will not get you a lot of bang for the buck. (Oracle has bit-map indices - but that may not apply here at all).
Upvotes: 1
Reputation: 1620
There are at least two additional options as well for the general problem of keeping old data:
Back to your two offered solutions:
Upvotes: 1
Reputation: 24159
General advice (you have to fill the gaps ;-) ):
Performance will only be significant in some cases (over a million records, huge row size...).
Will you query the two tables with 'unions', or identical queries? If you will not query the tables using the same queries, then I suggest different tables (possible Performance gain as the number of records grow, but mostly Meaning gain).
A problem of duplication is it might increase the work (writing queries, testing them...). But all technologies (especially modern ones) allow you to reduce or cancel the duplication.
For example, with an ORM, you can have an abstract entity that maps to the common fields but no table, and two subclasses that map to your tables. No duplication of column information. And the ORM can create your database scripts too, so you don't even have these (although you should review them by hand for production database, of course).
UPDATE after the update in the question:
You can create the indexes you want, don't worry. If the performance you're looking for it to query the data not expired for a price over X, create an index (expired, price) and you're just fine :-)
Upvotes: 4
Reputation: 37655
At the rate that anyone accumulates listings such as you describe, it would be a long time before performance would degrade. And hardware and software performance increase faster.
Don't make something complicated until you are sure you need it and simple won't work. Keep it in one table. See the question about pessimizations - this is one.
Upvotes: 2
Reputation: 10482
Personally I would say to move all the expired ones into a seperate table. As the database grows you are going to be wanting better performance from your 'live' records because those are likely to be hit the most often.
All the old records will cause the table size to continually grow, and that means slower queries, even with the query optimization and such that takes place.
EDIT: As someone else mentioned one big con to this approach is if you plan on combining the live data and the archived data frequently. If you will always reference them separately than great, but if not you will have lots of joins and unions required to pull the data together - which is not ideal.
Upvotes: 1
Reputation: 13486
Option A) that way you have all your data in one place and can more easily create queries for reporting, listing a users historical entries etc. Any speed issues should be mitigated by the database's index on that column. Option B) is premature optimisation.
Upvotes: 6
Reputation: 4503
There is no such thing as a Universal best practice. However if the table is tends to become huge and your search is taking too much time then you may need to Archived the items in a separate table or soo.. Other wise you can implement proper indexing too make thing faster. It really depends on the amount and type of data you are considering.
Upvotes: 0
Reputation: 1850
Don't use B, it is basically splitting off the attributes.
The way I would do is to basically use two date columns instead. ValidFromDate and ValidToDate.
Upvotes: 3