boop
boop

Reputation: 7787

How to get the last records for a combination of 2 columns?

I have a situation which I think can be compared to services like CamelCamelCamel, Keepa and so on. Lets say I track the price of an article on each day for a couple of countries. So my table, lets call it Trend, would look something like this

Id     Created      ArticleId    Country    Price
-------------------------------------------------
01     19/11/05     452          US         45.90
02     19/11/05     452          CA         52.99
03     19/11/05     452          MX         99.99
04     19/11/06     452          US         20.00
05     19/11/06     452          CA         25.00
06     19/11/06     452          MX         50.00
...                
97     19/11/05     738          US         12.99
98     19/11/05     738          CA         17.50
99     19/11/05     738          MX         45.50

So it's the next day and I want to update the Trend table. If the price in a country is still the same, I skip the article/country combination. If there is a new price I'll add a new record.

Now I want to query the table to get each ArticleId / Country combination. But only the last record of it (orderd by timestamp). So taken the example above I'd expect to get the records 04, 05 and 06 for ArticleId 452. Not 01, 02 and 03

So I start out with this basic query. But how do I get to change it to get my expected results?

SELECT
    *
FROM
    Trend
ORDER BY 
    Created DESC

Upvotes: 4

Views: 3044

Answers (2)

Derrick Moeller
Derrick Moeller

Reputation: 4970

You can do this with a combination of DISTINCT and CROSS APPLY.

SELECT DISTINCT ca.Id, ca.Created, t.ArticleId, t.Country, ca.Price
FROM Trend t
CROSS APPLY (SELECT TOP 1 Id, Created, Price
             FROM Trend
             WHERE ArticleId = t.ArticleId AND Country = t.Country
             ORDER BY Created DESC) ca

Often times when writing queries using APPLY your joining columns (ArticleId and Country) make up a unique key on another table. If this applies to your database you can drop the DISTINCT and speed up the query.

SELECT ca.Id, ca.Created, a.ArticleId, a.Country, ca.Price
FROM Article a
CROSS APPLY (SELECT TOP 1 Id, Created, Price
             FROM Trend
             WHERE ArticleId = a.ArticleId AND Country = a.Country
             ORDER BY Created DESC) ca

Lastly, if you're having issues with performance, you may want to create an index.

CREATE NONCLUSTERED INDEX [NC_Trend_ArticleId] ON [Trend]
(
    [ArticleId] ASC,
    [Country] ASC,
    [Created] ASC
)
INCLUDE ([Price])

Presumably Id is a PRIMARY KEY and is already covered by a CLUSTERED INDEX, if so, the above should be appropriate for most solutions.

Upvotes: 1

GMB
GMB

Reputation: 222682

One method uses a correlated subquery for filtering:

select t.*
from trend t
where t.created = (
    select max(t1.created) 
    from trend t1
    where t1.articleId = t.articleId and t1.country = t.country
)

For performance, you want an index on (articleId, country, created).

You might also want to consider the anti-left join approach:

select t.*
from trend t
left join trend t1 
    on  t1.articleId = t.articleId 
    and t1.country = t.country
    and t1.created > t.created
where t1.articleId is null

Finally, another typical solution is to join the table with an aggregate query:

select t.*
from trend t
inner join (
    select articleId, country, max(created) created
    from trend
    group by articleId, country
) t1 
    on  t1.articleId = t.articleId 
    and t1.country = t.country
    and t1.created = t.created

Which solution performs better depends on the size and distribution of your data.

Upvotes: 3

Related Questions