Josh Gribbon
Josh Gribbon

Reputation: 75

Aggregate function on one column, group by on another, leave a third unaffected

I feel like this isn't too bad of a problem but I've been looking for a solution for the greater part of the day to no avail. Other solutions I've seen plenty of that don't seem to help me have been for getting columns that aren't unique values along with a group by and aggregate function.

The problem

I have a table of historical data as follows:

ID | source | value | date
---+--------+-------+-----------
1  | 12     | 10    | 2016-11-16
2  | 12     | 20    | 2015-11-16
3  | 12     | 30    | 2014-11-16
4  | 13     | 40    | 2016-11-16
5  | 13     | 50    | 2015-11-16
6  | 13     | 60    | 2014-11-16

I'm trying to get data before a certain date(within a loop to go different ranges), then getting the sum of the values grouped by source. So as an example "get all records before 30 days ago, and get the sum of the values of the unique sources, using the most recent dated entry for each".

So the first step was to remove entries with dates not in the range, an easy where date < getdate()-30 for example to get:

ID | source | value | date
---+--------+-------+-----------
2  | 12     | 20    | 2015-11-16
3  | 12     | 30    | 2014-11-16
5  | 13     | 50    | 2015-11-16
6  | 13     | 60    | 2014-11-16

Now my issue is finding a way to group by source and take the max date, and then sum up the result across all sources. The idea hear is that we don't know when the last entry is, so before the specified date we get all records, then take the newest entry for each unique source, and sum those up to get the total value at that time.

So the next step would be to group by source using the max of date, resulting in :

ID | source | value | date
---+--------+-------+-----------
2  | 12     | 20    | 2015-11-16
5  | 13     | 50    | 2015-11-16

And then the final step would be to sum the values, and then this process is repeated to get the sum value for multiple dates, so this would result in the row

 value | date
-------+-----------
 70    | getdate() - 30

to use for the rest.

Where I'm stuck

I'm trying to group by source and use the max of date to get the most recent entry for each unique source, but if I use the aggregate function or group by, then I can't preserve the ID or value columns to stick with the chosen max row. It's totally possible I'm just misunderstanding how aggregate functions work.

Progress so far

The best place I've gotten to yet is something like

with dataInDateRange as (
    select *
    from #historicalData hd
    where hd.date < getdate() - 30
)
select ???, max(date)
from dataInDateRange
group by source

But I'm not seeing how I can do this without somehow preserving a unique ID for the row that has the max date for each source so then I can go back and sum up the numbers.

Thank you great people for any help/guidance/lessons

Upvotes: 0

Views: 1119

Answers (1)

Juan Carlos Oropeza
Juan Carlos Oropeza

Reputation: 48197

USE row_number()

with dataInDateRange as (
    select *
    from #historicalData hd
    where hd.date < getdate() - 30
), rows as (
    select *, 
           row_number() over (partition by source 
                              order by date desc) as rn
    from dataInDateRange
)
SELECT *
FROM rows 
WHERE rn = 1  

Upvotes: 2

Related Questions