Edgar Derby
Edgar Derby

Reputation: 2825

Use MIN() where you cannot GROUP?

I feel pretty dumb, but I get stuck with an apparently very easy query. I have something like this, where every row is a user that watched a movie:

user_id     date    duration
   1      01-01-01    62m
   1      03-01-01    95m
   2      02-01-01    58m
   2      06-01-01    25m
   2      08-01-01    95m
   3      03-01-01    96m

Now, what I would like to have is a table where I have the first movie watched by each user and its duration. The problem is if I use MIN() then I have to GROUP both user_id and duration. But if I GROUP for duration as well, then I am basically going to have the same table back. How can I solve the problem?

Upvotes: 1

Views: 128

Answers (9)

wildplasser
wildplasser

Reputation: 44240

If you want the first watch_date per user, there should be no date before this date for this user:

SELECT *
FROM watched_movies wm
WHERE NOT EXISTS (
  SELECT *
  FROM watched_movies nx
  WHERE nx.user_id = wm.user_id
  AND nx.watch_date < wm.watch_date
  );

Note: I replaced the date column by watch_date, since date is a reserved word (type name).

Upvotes: 1

productioncoder
productioncoder

Reputation: 4335

Try this query. I haven't tested it.

SELECT date, duration FROM tablename n
    WHERE NOT EXISTS(
        SELECT date, user_id FROM tablename g
        WHERE n.user_id = g.user_id AND g.date < n.date
        );

Upvotes: 2

Vulcronos
Vulcronos

Reputation: 3456

If you are using SQL Server 2005 or later, you can use windowing functions.

SELECT *
FROM
(
SELECT user_id, date, duration, MIN(date) OVER(PARTITION BY user_id) AS MIN_DATE
FROM MY_TABLE
) AS RESULTS
WHERE date = MIN_DATE

The over clause and partion by will "group by" the user_id and select the min date per user_id without eliminating any rows. Then you select from the table where the date is equal to the min date and you are left with the first date per user_id. This is a common trick once you know about windowing functions.

Upvotes: 1

Declan_K
Declan_K

Reputation: 6826

Use a sub-query to get the min date then join that back to the table to get all other relevant columns.

SELECT  T2.user_id
        ,T2.date
        ,T2.duration
FROM    YourTable T2
INNER JOIN
        (
        SELECT  T1.user_id
                ,MIN(T1.date) as first_date
        FROM    YourTable T1
        ) SQ
ON      T2.user_id = sq.user_id
AND     T2.date = sq.first_date

Upvotes: 0

Srini V
Srini V

Reputation: 11355

Try this:

WITH TABLE1
    AS (SELECT
             '1' AS USER_ID,
             '01-01-01' AS DT,
             62 AS DURATION
        FROM
             DUAL
        UNION ALL
        SELECT
             '1' AS USER_ID,
             '03-01-01' AS DT,
             95 AS DURATION
        FROM
             DUAL
        UNION ALL
        SELECT
             '2' AS USER_ID,
             '02-01-01' AS DT,
             58 AS DURATION
        FROM
             DUAL
        UNION ALL
        SELECT
             '2' AS USER_ID,
             '06-01-01' AS DT,
             25 AS DURATION
        FROM
             DUAL
        UNION ALL
        SELECT
             '2' AS USER_ID,
             '08-01-01' AS DT,
             95 AS DURATION
        FROM
             DUAL
        UNION ALL
        SELECT
             '3' AS USER_ID,
             '03-01-01' AS DT,
             96 AS DURATION
        FROM
             DUAL)
SELECT
      *
FROM
      (SELECT
            USER_ID,
            DT,
            DURATION,
            RANK ( ) OVER (PARTITION BY USER_ID ORDER BY DT ASC) AS ROW_RANK
       FROM
            TABLE1)
WHERE
      ROW_RANK = 1

Upvotes: 0

StevieG
StevieG

Reputation: 8709

This should give you the duration of the first movie watched on the earliest date:

SELECT a.user_id, b.date, a.duration
FROM table a 
INNER JOIN (SELECT user_id,min(date) date FROM table GROUP BY user_id) b ON a.user_id = b.user_id AND a.date = b.date
INNER JOIN (SELECT user_id,date,min(session_id) FROM table GROUP BY user_id, date) c ON b.user_id = c.user_id AND b.date = c.date AND a.session_id = c.session_id

Upvotes: 0

Tim Schmelter
Tim Schmelter

Reputation: 460098

You can use a ranking function like ROW_NUMBER:

WITH CTE AS
(
  SELECT rn = ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY date ASC),
         user_id, date, duration
  FROM dbo.TableName
)
SELECT user_id, date, duration FROM CTE WHERE rn = 1

The advantage of ROW_NUMBER is that you can change the logic easily. For example, if you want to reverse the logic and get the row of the last watched film per user, you just have to change ORDER BY date ASC to ORDER BY date DESC.

The advantage of theCTE (common-table-expression) is that you can also use it to delete or update these records. Often used to delete or identify duplicates. So you can first select to see what will be deleted/updated before you execute it.

Upvotes: 2

John Woo
John Woo

Reputation: 263703

You can use ROW_NUMBER() which is a ranking function that generates sequential number for every group based on the column that you want to sort. In this case, if there is a tie, only one record for every user is selected but if you want to select all of them, you need to use DENSE_RANK() rather than ROW_NUMBER()

SELECT  user_id, date, duration
FROM    
        (
            SELECT  user_id, date, duration,
                    ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY date) rn
              FROM tableName
        ) a
WHERE rn = 1

this also assumes that the data type of column date is DATE

Upvotes: 1

Derek
Derek

Reputation: 23228

Assuming there can only be a single record per user per date, it'd be something like this:

select y.*
from table t
inner join (
  select user_id, min(date) mindate
  from table
  group by user_id
) t1
  on  t.user_id = t1.user_id
      and t.date = t1.mindate

Upvotes: 1

Related Questions