tohhy
tohhy

Reputation: 145

How to select unique rows by certain columns and for each date in SQL?

I have a database table which looks like the following:

id | project_id | stat_date | source | medium | campaign | impressions
----------------------------------------------------------------------
 5 |       2312 | 2019-08-30| google |   cpc  |  camp_a  |      20
 4 |       2312 | 2019-08-30| google |   cpc  |  camp_a  |      20
 3 |       2312 | 2019-08-30| google | organic|  camp_b  |      12
 2 |       2312 | 2019-08-29| google |   cpc  |  camp_a  |      35
 1 |       2312 | 2019-08-29| google |   cpc  |  camp_c  |      44
...|        ... |        ...|    ... |   ...  |   ...    |      ...

What I need to get is (without id, stat_date columns) between 29th and 30th (dates interval may vary):

 project_id | source | medium | campaign | impressions
------------------------------------------------------
       2312 | google |   cpc  |   camp_a |      55
       2312 | google | organic|   camp_b |      12
       2312 | google |   cpc  |   camp_c |      44

As you see, I want to get rid of duplicate rows with impressions of 20 and get the sum of 'impressions' column at the end.

So what query may I use to achieve that kind of result?

Upvotes: 1

Views: 48

Answers (1)

GMB
GMB

Reputation: 222512

You seem to be looking for a simple aggregate query with a filter in the WHERE clause:

SELECT
    project_id,
    source,
    medium,
    campaign,
    SUM(DISTINCT impressions) impressions
FROM mytable
WHERE stat_date >= '2019-08-29' AND stat_date <= '2019-08-30'
GROUP BY 
    project_id,
    source,
    medium,
    campaign

However, it should be noted that the above query assumes that the same impressions count does not occur on two distinct days for the same project_id/source/medium/campaign. While this works for your sample data, it might not be exactly what you need.

The following query might give you a better result for that edge case (it actually removes duplicates before aggregating):

SELECT
    project_id,
    source,
    medium,
    campaign,
    SUM(impressions) impressions
FROM (
    SELECT DISTINCT
        stat_date,
        project_id,
        source,
        medium,
        campaign,
        impressions
    FROM mytable 
) x
WHERE stat_date >= '2019-08-29' AND stat_date <= '2019-08-30'
GROUP BY 
    project_id,
    source,
    medium,
    campaign

In this demo on DB Fiddle, both queries return:

| project_id | source | medium  | campaign | impressions |
| ---------- | ------ | ------- | -------- | ----------- |
| 2312       | google | cpc     | camp_a   | 55          |
| 2312       | google | cpc     | camp_c   | 44          |
| 2312       | google | organic | camp_b   | 12          |

Upvotes: 1

Related Questions