Reputation: 3
I have 2 tables in: users (user_id, fname, lname, department) and clock (id, punchType, punchTime, comment, user_id).
The SQL query below pulls 2 rows for some records and I can't figure out why. Any insight would be helpful.
SELECT user.user_id, user.fname, user.lname, user.department, punchType, punchTime, comment
FROM user
INNER JOIN (
SELECT *
FROM clock
WHERE punchTime IN (
SELECT MAX(punchTime)
FROM clock
GROUP BY user_id
)
) AS a
ON user.user_id = a.user_id
Upvotes: 0
Views: 57
Reputation: 461
You will notice that when you subquery by punchTime alone, you can end up with duplicate records per user. What happens is if any of a user's punchTimes match a max punch time, they stay in the set. So, if a user has a max time that matches someone else's max time, or the users has two+ records that represent their own max punch time, you will be joining multiple rows of the same user_id from clock with user table.
For example:
SELECT
user_id,
MAX(punchTime) as real_max_time,
COUNT(1) as dupe_count,
COUNT(DISTINCT(punchTime)) as unique_punchTimes
COUNT(DISTINCT(punchType)) as unique_punchTypes
FROM clock
WHERE punchTime IN (
SELECT MAX(punchTime)
FROM clock
GROUP BY user_id
)
GROUP BY
user_id
HAVING COUNT(1) > 1
Otherwise you could have a duplicate user_id within your user table. Maybe one user has been in multiple departments? or changed names?
Find duplicated user_ids with the following:
SELECT
user_id,
COUNT(1) as duplicate_user_count
FROM user
GROUP BY user_id
HAVING COUNT(1) >1
Putting it all back together - find where the duplication is happening and then add other columns to you care about once resolved:
SELECT
users.user_id,
users.dupe_users,
max_time.distinct_punchtimes,
max_time.distinct_punchtypes,
max_time.max_punchTime
FROM (
SELECT
user_id,
COUNT(1) as dupe_users
FROM user
GROUP BY
user_id
) as users
INNER JOIN (
SELECT
user_id,
COUNT(1) as clock_rows,
COUNT(DISTINCT(punchTime)) as distinct_punchtimes,
COUNT(DISTINCT(punchType)) as distinct_punchtypes,
MAX(punchTime) max_punchTime
FROM clock
GROUP BY user_id
) as max_time
ON users.user_id = max_time.user_id
Upvotes: 0
Reputation: 1269503
Because different users can have the same punch
time. One user's punchtime could be another users maximum punchtime. Here is one fix:
SELECT *
FROM clock
WHERE (user_id, punchTime) IN (
SELECT user_id, MAX(punchTime)
FROM clock
GROUP BY user_id
);
This could also be fixed with correlated subqueries and other methods.
Upvotes: 1