Andy Huang
Andy Huang

Reputation: 199

MySQL filter by GROUP BY result

I don't know if this is something I can achieve effectively with sub queries, or how to even build a query for this. I have to extract some knowledge about people no longer using our system. Imagine we have 3 users with user id 1024, 1234, and 5678; and User 1024 and 1234 are using theme A, and 5678 is using theme B:

$ SELECT * FROM user;  | $ SELECT * FROM user_theme;
+------+------+        | +------+-------+
|   id | name |        | | user | theme |
+------+------+        | +------+-------+
| 1024 | John |        | | 1024 |     A |
| 1234 | Jane |        | | 1234 |     A |
| 5678 | Jeff |        | | 5678 |     B |
+------+------+        | +------+-------+

The usage tracking table appears as this:

$ SELECT * FROM user_usage;
+----+------+---------------------+------+
| id | user | date                | uses |
+----+------+---------------------+------+
|  1 | 1234 | 2014-08-02 00:00:00 |    5 |
|  2 | 1234 | 2014-08-03 00:00:00 |    5 |
|  3 | 1234 | 2014-08-04 00:00:00 |    3 |
|  4 | 1234 | 2014-08-05 00:00:00 |    6 |
|  5 | 1024 | 2014-08-02 00:00:00 |    8 |
|  6 | 1024 | 2014-08-03 00:00:00 |    7 |
|  7 | 1024 | 2014-08-04 00:00:00 |    4 |
|  8 | 1024 | 2014-08-05 00:00:00 |    6 |
|  9 | 1024 | 2014-09-02 00:00:00 |    1 |
| 10 | 1024 | 2014-09-03 00:00:00 |    2 |
| 11 | 1024 | 2014-09-04 00:00:00 |    3 |
| 12 | 1024 | 2014-09-05 00:00:00 |    4 |
| 13 | 5678 | 2014-08-02 00:00:00 |    8 |
| 14 | 5678 | 2014-08-03 00:00:00 |    7 |
| 15 | 5678 | 2014-08-04 00:00:00 |    4 |
| 16 | 5678 | 2014-08-05 00:00:00 |    6 |
| 17 | 5678 | 2014-09-02 00:00:00 |    1 |
| 18 | 5678 | 2014-09-03 00:00:00 |    2 |
| 19 | 5678 | 2014-09-04 00:00:00 |    3 |
| 20 | 5678 | 2014-09-05 00:00:00 |    4 |
+----+------+---------------------+------+

I want to find out the break down to see how many usage have dropped from our system in 2014-09 (aka: have usage data in 2014-08, but no longer in 2014-09), grouped by the theme. So I want to write something like:

SELECT
    user_theme.theme,
    SUM(user_usage.users) 'uses lost'
FROM
    user_theme
    LEFT JOIN user_usage 
        ON user_theme.user = user_usage.user
WHERE
    ...
GROUP BY
    user_theme.theme
# HAVING ...?

And get result such as:

+-------+-----------+
| theme | uses lost |
+-------+-----------+
|     A |        19 |
|     B |         0 |
+-------+-----------+

Where the 19 comes from SUM(uses) for WHERE user = 1234 AND YEAR(date) = 2014 AND MONTH(date) = 8. I don't know I care about the SUM(uses) from user = 1234 in advance, because I only know I need to include user 1234 in the SUM(uses)'s WHERE clause because SUM(uses) for WHERE user = 1234 AND YEAR(date) = 2014 AND MONTH(date) = 9 is 0.

There's actually a lot of users, and a handful of themes (around 20K users, and about 10 themes), so ideally, I think I'd like to avoid doing the filtering in code as opposed to directly in the database. Is there a way to do this effectively in MySQL using raw SQL queries?

Upvotes: 0

Views: 67

Answers (1)

Iłya Bursov
Iłya Bursov

Reputation: 24229

Here is query which compares current month with previous one:

set @current_month = now();
set @previous_month = date_sub(@current_month, interval 1 month);

set @current_month = concat(year(@current_month), month(@current_month));
set @previous_month = concat(year(@previous_month), month(@previous_month));

select a.`theme`, sum(ifnull(b.uses_lost,0)) as uses_lost
from
`user_theme` as a
left outer join
(
select `user`, sum(uses) as uses_lost
from `user_usage`
where concat(year(`date`), month(`date`)) = @previous_month
  and `user` not in (
    select `user`
    from `user_usage`
    where concat(year(`date`), month(`date`)) = @current_month)
group by `user`
) as b
on (a.`user`=b.`user`)
group by a.`theme`;

fiddle for play

main idea is to find all users who used system during last month and has no rows during current month

Upvotes: 1

Related Questions