ricemaster
ricemaster

Reputation: 25

How to integrate cached business data into a table with history records according to specific time

Scenario as below:

  1. Assuming there are hundreds of card reader installed in some fixed position, and thousands of card which could pass through the reader randomly.
  2. Each reader will read card data at least 10 times per hours.
  3. During one READ period, the reader will read card several times depend on how long the card stayed in the field where reader can read the data from them. So when a reader reads a card, one READ_EVENT will be generated including: reader_id, card_id, times_of_reads, first_seen_time, last_seen_time.
  4. Reader could upload READ_EVENT to a backend system instantly.
  5. Reader also could cache the READ_EVENT in it if the network is down, and it will re-transfer the cached data when the network is OK again.

So I have a READ_EVENT table in DaseBase to save all the events.

When a cached READ_EVENT comes, I have to review all the history data to find out if:

To be clear, the "integrate" means if the two duration(from "first_seen_time" to "last_seen_time") of READ_EVENT in DB and cached READ_EVENT have a common period.

Here is the question:

Because the "first_seen_time"/"last_seen_time" in cached data could be any time (yesterday, last month, last year), and the table becomes larger and larger, it will be very very difficult to position the READ_EVENT which should be integrated. How to optimize the design of the database.

Upvotes: 0

Views: 36

Answers (1)

TAM
TAM

Reputation: 1741

You'll need one or more indexes, depending on your exact requirements and the database queries resulting from those. Having those indexes, the total number of records will be of minor importance, as a query on a small index range will still be fast and yield no more than a couple of records.

Assume that you have an index on reader_id, card_id and last_seen_time. Now you want to know whether a current event on that reader and card can be joined with the previous events. As far as I understand your requirements, that can only affect the latest record. So a query like

select *
from read_event
where reader_id = :reader_id and card_id = :card_id
order by last_seen_time desc

will yield the latest event as first record, so only one record needs to be fetched from the database, independent on how many records it contains.

Of course, if the record count becomes huge, this might become a problem in terms of sheer space or other use cases working on those data. Now, from your numbers, let's estimate 3 million records per year. After three years, you have 10 million. Do you still need each of these records after three years? So the next step would be to decide, from your functional requirements, how long you need old data on a record-per-record basis, and to aggregate them according to the actual needs of everyday business. But I would perform such a step only when the actual needs arise. Often, the requirements on a collection of data only evolve with their usage. So, if you throw data away too early, a use case might appear for which they are needed. That's the premature optimization anti-pattern.

Upvotes: 1

Related Questions