Doua Beri
Doua Beri

Reputation: 10949

mysql: average time for the connected visitors - optimization

The system I work is a little more complex to explain here but I can reduce it to something more simple.

Let's say I have a simple chat server and I count the seconds every client stays connected and save it in a table(I'm using mysql). So every time a client connects I save the time he stays connected in seconds (int) If he disconnects and connects again I save this info in another row because this is how I want. The number of times a client connects to the server in a day is between 50k-500k or even more(I know, I know but this is related to my complex system but irrelevant to my question here).

My problem is that I want to show to every client some stats about his visits similar to google analytics(by days), to be more specific I'm interested in showing the average time he spent on a certain day.

I'm looking at an optimized way to do this. So far I've thought about the following solutions:

  1. use select avg(time) from table where date=.... but speed problems might occur

  2. save the avg time in a separate table for every day and user. This solutions is ok, but raises another question: how do I save the average time? here are the situations I was thinking:

a) use mysql trigger to update the stats every time a client is connecting (using INSERT AFTER ...) this solution is not bad, however like I said the client can connect 500k times/day which means 500k times mysql needs to calculate the average time

b) make a separate application similar to a cron job or a timer task that updates the stats every X hours ,this way I know the mysql server will be used only once a few hours depending on the number of clients I have.

So far I'm thinking of implementing the 2.b solution, but I said to ask you first before proceeding. If you have better ideas please share.

Thanks

Upvotes: 0

Views: 148

Answers (2)

ruakh
ruakh

Reputation: 183446

In my opinion, this:

speed problems might occur

is not enough reason to avoid what is certainly the simplest and least error-prone solution, especially when it is so easy to change if and when speed problems do occur.

That being said — in the event of speed problems, I agree with your assessment: better to use a scheduled job that computes the average than to add a trigger that will impose a penalty on every insert.

Upvotes: 1

ChristopherS
ChristopherS

Reputation: 883

You can use solution a but don't recalculate the average over and over again. You can do it by storing the current average and the amount of items that where used to calculate the average. Your formula would be like:

(current_average*number_of_old_items+new_value)/(number_of_old_items+1)

Upvotes: 1

Related Questions