sunilsurana
sunilsurana

Reputation: 133

MySQL Query: Aggregation at two different level

I have two tables

mysql> select * from report;
+----+----------+------------+------------------+-------------+
| id | campaign | advertiser | impression_count | click_count |
+----+----------+------------+------------------+-------------+
|  1 | camp1    | adv1       |               20 |           6 |
|  2 | camp2    | adv2       |               10 |           2 |
|  3 | camp1    | adv1       |               15 |           3 |
|  4 | camp2    | adv2       |                6 |           1 |
+----+----------+------------+------------------+-------------+
4 rows in set (0.00 sec)

mysql> select * from device;
+-----------+-----------+
| report_id | device_id |
+-----------+-----------+
|         1 | d1        |
|         1 | d2        |
|         2 | d1        |
|         2 | d3        |
|         2 | d4        |
|         3 | d2        |
|         3 | d4        |
|         4 | d3        |
|         4 | d4        |
|         4 | d5        |
+-----------+-----------+
10 rows in set (0.00 sec)

I want report which is aggregated at campaign and advertiser level which has sum of impression and click count and distinct device_ids. So I wrote below query

SELECT 
    campaign,
    advertiser,
    sum(impression_count),
    sum(click_count),
    count(DISTINCT device_id)
FROM report 
LEFT JOIN device ON report.id = device.report_id
GROUP BY campaign, advertiser;
+----------+------------+-----------------------+------------------+---------------------------+
| campaign | advertiser | sum(impression_count) | sum(click_count) | count(distinct device_id) |
+----------+------------+-----------------------+------------------+---------------------------+
| camp1    | adv1       |                    70 |               18 |                         3 |
| camp2    | adv2       |                    48 |                9 |                         4 |
+----------+------------+-----------------------+------------------+---------------------------+

Here because of join impression count and click_count is aggregated for multiple rows. What is want is

+----------+------------+-----------------------+------------------+---------------------------+
| campaign | advertiser | sum(impression_count) | sum(click_count) | count(distinct device_id) |
+----------+------------+-----------------------+------------------+---------------------------+
| camp1    | adv1       |                    35 |               9  |                         3 |
| camp2    | adv2       |                    16 |                3 |                         4 |
+----------+------------+-----------------------+------------------+---------------------------+

http://sqlfiddle.com/#!2/05dd9d/1

Found not so good solution

select campaign,advertiser,ic,cc,count(distinct device_id) 
from (
    select 
        group_concat(id) as id,
        sum(impression_count)as ic,
        sum(click_count)as cc,
        campaign,advertiser 
    FROM report har GROUP BY campaign,advertiser) a 
    LEFT JOIN device dr ON FIND_IN_SET(dr.report_id, a.id) 
    group by a.id
);

But this uses group concat so may have problems if the lenght of group_concat result is large.

Upvotes: 1

Views: 2448

Answers (2)

Sumurai8
Sumurai8

Reputation: 20737

What you want to do is do two distinct queries, then join the resulting sets. The outer select is just to select the information we actually want, and to join the two temporary tables on a common value. You could do this with id and report_id too if you didn't want to select the distinct devices that are in the device table for an entire campaign.

select `firsttable`.campaign, `firsttable`.advertiser, a, b, c from 
  (select id, campaign, advertiser, sum(impression_count) as a, sum(click_count) as b
   from report
   group by campaign, advertiser
  ) as firsttable
  left join
  (select campaign, advertiser, count(distinct device_id) as c
   from device, report
   where id=report_id
   group by campaign, advertiser
  ) as secondtable on `firsttable`.campaign=`secondtable`.campaign and
                      `firsttable`.advertiser=`secondtable`.advertiser;

SQLFiddle: http://sqlfiddle.com/#!2/8bd63/20

This query is a combination of these two temporary tables:

| ID | CAMPAIGN | ADVERTISER |   A |   B |
|----|----------|------------|-----|-----|
|  1 |    camp1 |       adv1 |  35 |   9 |
|  5 |    camp1 |       adv2 | 900 | 900 |
|  2 |    camp2 |       adv2 |  16 |   3 |

| CAMPAIGN | ADVERTISER | C |
|----------|------------|---|
|    camp1 |       adv1 | 3 |
|    camp2 |       adv2 | 4 |

Result:

| CAMPAIGN | ADVERTISER |   A |   B |      C |
|----------|------------|-----|-----|--------|
|    camp1 |       adv1 |  35 |   9 |      3 |
|    camp1 |       adv2 | 900 | 900 | (null) |
|    camp2 |       adv2 |  16 |   3 |      4 |

The problem with your query was that it would duplicate rows when combining the report table with the device table. You would end up with something like this:

| CAMPAIGN | ADVERTISER | IMPRESSION_COUNT | CLICK_COUNT | DEVICE_ID |
|----------|------------|------------------|-------------|-----------|
|    camp1 |       adv1 |               20 |           6 |        d1 |
|    camp1 |       adv1 |               20 |           6 |        d2 |
|    camp2 |       adv2 |               10 |           2 |        d1 |
|    camp2 |       adv2 |               10 |           2 |        d3 |
|    camp2 |       adv2 |               10 |           2 |        d4 |
|    camp1 |       adv1 |               15 |           3 |        d2 |
|    camp1 |       adv1 |               15 |           3 |        d4 |
|    camp2 |       adv2 |                6 |           1 |        d3 |
|    camp2 |       adv2 |                6 |           1 |        d4 |
|    camp2 |       adv2 |                6 |           1 |        d5 |
|    camp1 |       adv2 |              900 |         900 |    (null) |

Upvotes: 3

JSB
JSB

Reputation: 54

perhaps this helps you:

SELECT 
    campaign,
    advertiser,
    SUM(impression_count) AS ic,
    sum(click_count) as cc,
    (select 
            count(distinct device_id)
        from
            device
        where
            report_id = id) AS DD
from
    report
group by campaign , advertiser; 

Upvotes: 0

Related Questions