Reputation: 23
This is my first question on stackoverflow, so any input on how I can improve my questions would be greatly appreciated. Also, I just started learning sql on my own time, so specific answers and explanations help. Thanks in advance!
I am trying to craft a specific query that pulls
every unique user_id
row such that it gets the newest entry for that user_id
. Looking at other posts, I came up with this query:
SELECT user_id, max(time) as time
FROM $session_name
GROUP BY user_id;
The result I would like to see is:
[1] => stdClass Object
(
[user_id] => 3
[lat] => 1
[lon] => 1
[time] => 2016-03-27 11:30:24
)
[2] => stdClass Object
(
[user_id] => 1234
[lat] => 1
[lon] => 1
[time] => 2016-03-27 11:29:46
)
But instead I get:
[1] => stdClass Object
(
[user_id] => 3
[time] => 2016-03-27 11:30:24
)
[2] => stdClass Object
(
[user_id] => 1234
[time] => 2016-03-27 11:29:46
)
I figure that, in my SELECT
statement, I need to enter the fields lat
and lon
, but after experimenting with different queries and reading different posts, the sql entries never executed properly.
Upvotes: 2
Views: 83
Reputation: 8763
The problem with grouping is that you will need to join the original table again to get the rest of the columns. If you want to just get the latest row per user it's much easier with not exists
(works in almost any DBMS):
SELECT user_id, lat, lon, time
FROM $session_name s
WHERE NOT EXISTS (SELECT * FROM $session_name where user_id = s.user_id AND time > s.time)
EXISTS
takes any SELECT
command and returns true
if there are any rows returned. NOT
is the negation of it, similar as in IS NULL
and IS NOT NULL
. Usually we don't want to just select from some table to get some values (then we could use IN
maybe), but we want to get very specific rows. In the query above we "bind" the user_id
to only get results for the same user_id
(we have to use an alias, s
above, to make it work because of ambiguity) and additionally we add the condition that the time in the NOT EXISTS
query is after the time of the row we are currently looking at, because if there is no row with later time then the current row is the latest for the user. Note that this query might return several rows per user, if it is possible that a user has two entries with the exact same timestamp.
Upvotes: 0
Reputation: 21034
A sub query with window function should work, along these lines:
SELECT
user_id,
lat,
long,
time
FROM (
SELECT
user_id,
lat,
long,
time,
ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY time DESC) as row_no
FROM @session_name
) u
WHERE row_no = 1
Or alternatively using LAST_VALUE window function:
SELECT DISTINCT
user_id,
LAST_VALUE(lat) OVER (PARTITION BY user_id ORDER BY time DESC ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) as lat,
LAST_VALUE(long) OVER (PARTITION BY user_id ORDER BY time DESC ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) as long,
LAST_VALUE(time) OVER (PARTITION BY user_id ORDER BY time DESC ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) as time
FROM @session_name
Upvotes: 1