notoriousnapper
notoriousnapper

Reputation: 23

SQL Query for grouping by user_id, and only getting the most recent (by time), returns limited object

This is my first question on stackoverflow, so any input on how I can improve my questions would be greatly appreciated. Also, I just started learning sql on my own time, so specific answers and explanations help. Thanks in advance!

I am trying to craft a specific query that pulls every unique user_id row such that it gets the newest entry for that user_id. Looking at other posts, I came up with this query:

SELECT user_id, max(time) as time 
FROM $session_name 
GROUP BY user_id;

The result I would like to see is:

[1] => stdClass Object
    (
        [user_id] => 3
        [lat] => 1 
        [lon] => 1
        [time] => 2016-03-27 11:30:24
    )

[2] => stdClass Object
    (
        [user_id] => 1234
        [lat] => 1 
        [lon] => 1
        [time] => 2016-03-27 11:29:46
    )

But instead I get:

[1] => stdClass Object
    (
        [user_id] => 3
        [time] => 2016-03-27 11:30:24
    )

[2] => stdClass Object
    (
        [user_id] => 1234
        [time] => 2016-03-27 11:29:46
    )

I figure that, in my SELECT statement, I need to enter the fields lat and lon, but after experimenting with different queries and reading different posts, the sql entries never executed properly.

Upvotes: 2

Views: 83

Answers (2)

maraca
maraca

Reputation: 8763

The problem with grouping is that you will need to join the original table again to get the rest of the columns. If you want to just get the latest row per user it's much easier with not exists (works in almost any DBMS):

SELECT user_id, lat, lon, time
FROM $session_name s
WHERE NOT EXISTS (SELECT * FROM $session_name where user_id = s.user_id AND time > s.time)

EXISTS takes any SELECT command and returns true if there are any rows returned. NOT is the negation of it, similar as in IS NULL and IS NOT NULL. Usually we don't want to just select from some table to get some values (then we could use IN maybe), but we want to get very specific rows. In the query above we "bind" the user_id to only get results for the same user_id (we have to use an alias, s above, to make it work because of ambiguity) and additionally we add the condition that the time in the NOT EXISTS query is after the time of the row we are currently looking at, because if there is no row with later time then the current row is the latest for the user. Note that this query might return several rows per user, if it is possible that a user has two entries with the exact same timestamp.

Upvotes: 0

Paul Grimshaw
Paul Grimshaw

Reputation: 21034

A sub query with window function should work, along these lines:

SELECT
    user_id,
    lat,
    long,
    time
FROM (
    SELECT
        user_id,
        lat,
        long,
        time,
        ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY time DESC) as row_no
     FROM @session_name 
) u
WHERE row_no = 1

Or alternatively using LAST_VALUE window function:

SELECT DISTINCT
    user_id,
    LAST_VALUE(lat)  OVER (PARTITION BY user_id ORDER BY time DESC ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) as lat,
    LAST_VALUE(long) OVER (PARTITION BY user_id ORDER BY time DESC ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) as long,
    LAST_VALUE(time) OVER (PARTITION BY user_id ORDER BY time DESC ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) as time
FROM @session_name

Upvotes: 1

Related Questions