Flora Mitchell
Flora Mitchell

Reputation: 111

mysql having... > avg() doesn't work as expected

I've created two views to help calculate user_diary_number and then select users of whom diary numbers > average of total user's user_diary_number.

two views are like below:

create view user_diary_number as
(
select user_id,count( distinct diary_id ) as diary_num
from user_diary

group by user_id
);

and second using having and avg:

create view hw_diary as
(
select u.user_id, u.realname, ud.diary_num, school.school_name
from (user as u cross join user_diary_number as ud on u.user_id = ud.user_id )cross join school on u.school_id = school.school_id
having diary_num > avg(diary_num)

);

What the problem is now, the second view only have 1 row of result. and absolutely, we have more than 1 users whose diary number > average diary_num. Indeed, I have 251 diaries in total and 103 users. Some of users have 9, 4, 5 diaries. But the result only comes in only 1 user who have 3 diaries.

my relative tables are:

CREATE TABLE IF NOT EXISTS `school` (
  `school_id` int(11) NOT NULL,
  `school_name` varchar(45) NOT NULL,
  `location` varchar(45) NOT NULL,
  `master` varchar(45) NOT NULL,
  `numbers_of_student` int(11) NOT NULL,
  PRIMARY KEY (`school_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

CREATE TABLE IF NOT EXISTS `user_diary` (
  `diary_id` int(11) NOT NULL AUTO_INCREMENT,
  `user_id` int(11) NOT NULL,
  `title` varchar(45) NOT NULL,
  `content` varchar(255) NOT NULL,
  `addtime` DATETIME NOT NULL,
  PRIMARY KEY (`diary_id`,`user_id`),
  KEY `fk_diary_user_id_idx` (`user_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 AUTO_INCREMENT=1 ;

Is it any problems with cross join? or something else? thanks a lot!

Upvotes: 1

Views: 3718

Answers (3)

spencer7593
spencer7593

Reputation: 108370

Something like this should return the resultset you are looking for:

 SELECT u.user_id
      , u.realname
      , c.diary_num
      , s.school_name
   -- , a.diary_avg
   FROM ( SELECT d.user_id
               , COUNT(DISTINCT d.diary_id) AS diary_num
            FROM user_diary d
        ) c
   JOIN user u
     ON u.user_id = c.user_id
   JOIN school s
     ON s.school_id = u.school_id
   JOIN ( SELECT AVG(v.diary_num) AS diary_avg
            FROM ( SELECT t.user_id
                        , COUNT(DISTINCT t.diary_id) AS diary_num
                     FROM user_diary t
                 ) v
        ) a
     ON a.diary_avg < c.diary_num
  ORDER BY 1

The inline view aliased as c gets us the diary_num (count) for each user.

The inline view aliased as a gets us the average of all the diary_num for all users. That is getting us an "average" of the counts, which is what it looks like your original query was intending to do.

As an alternative, we could get the "average" number of diaries per user as ... the total count of all diaries divided by the total count of all users. To do that, replace that inline view aliased as a with something like this:

        ( SELECT COUNT(DISTINCT t.diary_id)
                 / NULLIF(COUNT(DISTINCT v.user_id),0) AS diary_avg
            FROM user v
            LEFT
            JOIN user_diary t
              ON t.user_id = v.user_id
        ) a

This yields slightly different results, as its a calculation on total counts, rather than an average of a calculation.


NOTE

The CROSS keyword has no influence on the MySQL optimizer.

We do typically include the CROSS keyword as documentation for future reviewers. It indicates that we have purposefully omitted the usual ON clause. (As a reviwer, when we see a JOIN without an ON clause, our minds race to a "possible unintended Cartesian product"... the author's inclusion of the CROSS keyword alerts us (the reviewer) that the omission of the ON clause was purposeful.

But the MySQL optimizer doesn't care one whit whether the CROSS keyword is included or omitted.


One more question: Does MySQL support for "View's SELECT contains a subquery in the FROM clause"?

A: Really old versions (3.x ?) of MySQL did not support subqueries. But certainly, MySQL 5.1 and later do support subqueries.

To answer your question, yes, a SELECT statement can be used as an inline view as a rowsource for another query, e.g.

SELECT v.*
  FROM (
         SELECT 1 AS foo
       ) v

Upvotes: 1

Bohemian
Bohemian

Reputation: 424983

You must calculate the average in a separate subquery.

Something like:

select ...
from ...
group by ...
having diary_num > (
    select avg(diary_num)
    from ...) 

You can fill in the blanks with what makes sense

Upvotes: 2

prosfilaes
prosfilaes

Reputation: 1308

You can't use avg that way. In my personal movie database,

select * from movie having year > avg(year);

produces nothing, and

select * from movie having year > (select avg (year) from movie);

produces the expected result.

Upvotes: 5

Related Questions