Stefan
Stefan

Reputation: 2194

MySQL Performance - "Redundancy" vs Speed/System Performance

So... assuming i have a database with three tables:
Table clients
Table data
and Table clients_to_data

And I have a API which allows Clients to Access data from Table data. Every client has a record in Table clients (with things like IP adress etc.) To log who accesses what, i'm logging in the table clients_to_data (which contains the ID for table clients, table data and a timestamp.)
Every time a user access my API, he get's logged in the clients_to_data table. (So records in clients and data are not updated, just read.)
I also want to be able to get the amount of hits per client. Pretty easy, just query the clients_to_data table with a client_id and count the results. But as my DB grows, i'll have tenthousands of records in the clients_to_data table. And here's my question:
Is it a better practice to add a field "hits" to Table clients that stores the amount of hits for that user and increment it every time the user queries the API
So this would be adding redundancy to the DB which i've heard generally is a bad thing. But in this case i think it would speed up the process of retrieving the amount of hits.
So which method is better and faster in this case? Thanks for your help!

Upvotes: 1

Views: 643

Answers (3)

Tony Hopkinson
Tony Hopkinson

Reputation: 20320

Faster when? Appending to the table will be faster , than finding the record and updating it, much faster than reading it, incrementing and updating it.

However having hits "precalulated", will be faster than the aggregate query to count them.

What you gain on the swings you lose on the roundabouts, which choice you make depends on your current usage patterns. So are you prepared to slow down adding a hit, to gain a signicant boost on finding out how many you've had?

Upvotes: 2

beck03076
beck03076

Reputation: 3308

Now that our table structures are all clearly defined, lets get to work.

You want to record something in the DB which is the number of times every client has accessed the data, in other terms,

Insert a record into a table "client_to_data" for every clients "impression".

You are worried about 2 things,

1. Redundancy
2. Performance when retrieving the count

What about the performance when storing the count.(Insert statements)..?

This is a classic scenario, where I would write the data to be inserted into memcache, and do a bulk insert at the end of the day.

More importantly, I will normalize the data before inserting it to the DB. As to select, create indexes. If its text, install sphinx.

Thanks.

Upvotes: 1

bumperbox
bumperbox

Reputation: 10214

Obviously selecting a single integer column from a table will be faster then selecting a count() of rows from a table.

The complexity trade off is a bit moot. 1 way you need to write a more complex sql, the other way you will need to update/insert 2 tables in your code.

How often is the number of hits queried? Do you clients look it up, or do you check it once a month? If you only look now and then I probably wouldn't be too concerned about the time taken to select count(*).

If your clients look up the hit count with every request, then I would look at storing a hits column.

Upvotes: 2

Related Questions