Khanzor
Khanzor

Reputation: 5010

TSQL difficult join issue

I struggling with a problem I have in TSQL, I need to get the top 10 results for each user from a table that might contain more than 10 results.

My natural (and procedurally minded) approach is "for each user in table T select the top 10 results ordered by date".

Each time I try to formulate the question in my mind in a set based approach, I keep running into the term "foreach".

Is it possible to do something like this:

SELECT *
FROM table AS t1
INNER JOIN (
    SELECT TOP 10 *
    FROM table AS t2
    WHERE t2.id = t1.id
    ORDER BY date DESC
)

Or even

SELECT (    SELECT TOP 10 *
             FROM table AS t2
             WHERE t2.id = t1.id
             ORDER BY date    )
FROM table AS t1

Or is there another solution to this using temp tables that I should think about?

EDIT:

Just to be perfectly clear - I need to the top 10 results for each user in the table, e.g. 10 * N where N = number of users.

EDIT:

In response to a suggestion made by RBarryYoung, I'm having an issue, which is best demonstrated with code:

CREATE TABLE #temp (id INT, date DATETIME)

INSERT INTO #temp (id, date) VALUES (1, GETDATE())
INSERT INTO #temp (id, date) VALUES (1, GETDATE())

SELECT *
FROM #temp AS t1
CROSS APPLY (
 SELECT TOP 1 *
 FROM #temp AS t2
 WHERE t2.id = t1.id
 ORDER BY t2.date DESC
) AS t2

DROP TABLE #temp

Running this, you can see that this doesn't limit the results to the TOP 1... Am I doing something wrong here?

EDIT:

It seems my last example provided a bit of confusion. Here is an example showing what I want to do:

CREATE TABLE #temp (id INT, date DATETIME)
INSERT INTO #temp (id, date) VALUES (1, GETDATE())
INSERT INTO #temp (id, date) VALUES (1, GETDATE())
INSERT INTO #temp (id, date) VALUES (1, GETDATE())
INSERT INTO #temp (id, date) VALUES (2, GETDATE())

SELECT *
FROM #temp AS t1
CROSS APPLY
(
    SELECT TOP 2 *
 FROM #temp AS t2
    WHERE t2.id = t1.id
    ORDER BY t2.date DESC
) AS t2

DROP TABLE #temp

This outputs:

1 2009-08-26 09:05:56.570 1 2009-08-26 09:05:56.583
1 2009-08-26 09:05:56.570 1 2009-08-26 09:05:56.583
1 2009-08-26 09:05:56.583 1 2009-08-26 09:05:56.583
1 2009-08-26 09:05:56.583 1 2009-08-26 09:05:56.583
1 2009-08-26 09:05:56.583 1 2009-08-26 09:05:56.583
1 2009-08-26 09:05:56.583 1 2009-08-26 09:05:56.583
2 2009-08-26 09:05:56.583 2 2009-08-26 09:05:56.583

If I use distinct:

SELECT DISTINCT t1.id
FROM #temp AS t1
CROSS APPLY
(
    SELECT TOP 2 *
 FROM #temp AS t2
    WHERE t2.id = t1.id
    ORDER BY t2.date DESC
) AS t2

I get

1
2

I need

1
1
2

Does anyone know if this is possible?

EDIT:

The following code will do this

WITH RowTable AS
(
SELECT 
 id, date, ROW_NUMBER() OVER (PARTITION BY id ORDER BY date DESC) AS RowNum
FROM #temp 
)
SELECT *
FROM RowTable
WHERE RowNum <= 2;

I posted in the comments, but there is no code formatting, so it doesn't look very nice.

Upvotes: 1

Views: 1752

Answers (5)

RBarryYoung
RBarryYoung

Reputation: 56755

Yes, there are several differet good ways to do this in 2005 and 2008. The one most similar to what you are already trying is with CROSS APPLY:

SELECT T2.*
FROM (
    SELECT DISTINCT ID FROM table
) AS t1
CROSS APPLY (
    SELECT TOP 10 *
    FROM table AS t2
    WHERE t2.id = t1.id
    ORDER BY date DESC
) AS t2
ORDER BY T2.id, date DESC

This then returns the ten most recent entries in [table] (or as many as exist, up to 10), for each distinct [id]. Asumming that [id] corresponds to a user, then this should be exactly what you are asking for.

(edit: slight changes because I did not take into account that T1 and T2 were the same tables and thus there will be multiple duplicate t1.IDs matching multiple duplicate T2.ids.)

Upvotes: 4

forrest
forrest

Reputation:

select userid, foo, row_number() over (partition by userid order by foo)  as rownum from table where rownum <= 10

Upvotes: 2

Bill Karwin
Bill Karwin

Reputation: 562861

Here's a trick I use to do this "top-N-per-group" type of query:

SELECT t1.id
FROM table t1 LEFT OUTER JOIN table t2 
 ON (t1.user_id = t2.user_id AND (t1.date > t2.date
     OR t1.date = t2.date AND t1.id > t2.id))
GROUP BY t1.id
HAVING COUNT(*) < 10
ORDER BY t1.user_id, COALESCE(COUNT(*), 0);

Upvotes: 0

Jeff Leonard
Jeff Leonard

Reputation: 3294

I believe this SO question will answer your question. It's not answering exactly the same question, but I think the solution will work for you too.

Upvotes: 0

Russell
Russell

Reputation: 17739

It is possible, however using nested queries will be slower.

The following will also find the results you are looking for:

SELECT TOP 10 * 
FROM table as t1
INNER JOIN table as t2 
  ON t1.id = t2.id
ORDER BY date DESC

Upvotes: 0

Related Questions