John Weisz
John Weisz

Reputation: 31962

How to get standard deviation of grouped rows?

I want to calculate the standard deviation between page views on my site. I'd like to do this using pure MySQL - without querying the whole table to the webserver - and return a single number to the PHP code for further use. Each page view is stored as a visitor_id - page_id - visit_count trio as per the following schema:

 +============+=========+=============+
 | visitor_id | page_id | visit_count |
 +============+=========+=============+
 | 1          | 2       | 7           |
 +------------+---------+-------------+
 | 2          | 2       | 4           |
 +------------+---------+-------------+
 | 1          | 1       | 17          |
 +------------+---------+-------------+
 | 3          | 2       | 12          |
 +------------+---------+-------------+
 | 1          | 3       | 639478      |
 +------------+---------+-------------+
 | 2          | 1       | 6           |
 +------------+---------+-------------+

page_id refers to a PRIMARY_KEY in the pages table, visitor_id refers to a PRIMARY_KEY in the visitors table. The above table's primary key is the visitor_id - page_id pair, since the same page seen by the same visitor is recorded by increasing the visit_count of the corresponding row, instead of creating a new one.

Before calculating standard deviation, the entries should be grouped together by page_id, their visit_count summed (visitor_id can be ignored here), so, effectively, I want to calculate the deviation of the following:

 +=========+=============+
 | page_id | visit_count |
 +=========+=============+
 | 2       | 23          |
 +---------+-------------+
 | 1       | 23          |
 +---------+-------------+
 | 3       | 639478      |
 +---------+-------------+

I'm aware of the possible PHP solutions, but I'm interested in a MySQL one.

Upvotes: 0

Views: 57

Answers (2)

Gordon Linoff
Gordon Linoff

Reputation: 1270301

If you want the standard deviation for each page (i.e., the visitors are the population):

select page_id, sum(visit_count) as visit_count, std(visit_count) as visit_std
from table1
group by page_id;

If you want the standard deviation over the pages:

select std(visit_count) as page_std
from (select page_id, sum(visit_count) as visit_count
      from table1
      group by page_id
     ) t;

Upvotes: 1

G_V
G_V

Reputation: 2434

You could create a new table that stores timestamp + current views so you can view a history of changes in views. You'd be able to check the last two timestamped entries and how much the difference is between the two as well as a whole bunch of other stuff you haven't even thought of yet. Like graphs. Or pie charts showing activity increases per week day. Mmmm pie.

Upvotes: 0

Related Questions