John Stempien
John Stempien

Reputation: 3

SQL Returning Double Rows

I have 2 tables in: users (user_id, fname, lname, department) and clock (id, punchType, punchTime, comment, user_id).

The SQL query below pulls 2 rows for some records and I can't figure out why. Any insight would be helpful.

SELECT user.user_id, user.fname, user.lname, user.department, punchType, punchTime, comment
    FROM user
    INNER JOIN (
        SELECT *
        FROM clock
        WHERE punchTime IN (
         SELECT MAX(punchTime) 
         FROM clock
         GROUP BY user_id
        )
       ) AS a
       ON user.user_id = a.user_id

Upvotes: 0

Views: 57

Answers (2)

JLuxton
JLuxton

Reputation: 461

You will notice that when you subquery by punchTime alone, you can end up with duplicate records per user. What happens is if any of a user's punchTimes match a max punch time, they stay in the set. So, if a user has a max time that matches someone else's max time, or the users has two+ records that represent their own max punch time, you will be joining multiple rows of the same user_id from clock with user table.

For example:

SELECT
  user_id,
  MAX(punchTime) as real_max_time,
  COUNT(1) as dupe_count,
  COUNT(DISTINCT(punchTime)) as unique_punchTimes
  COUNT(DISTINCT(punchType)) as unique_punchTypes
FROM clock
WHERE punchTime IN (
    SELECT MAX(punchTime) 
    FROM clock
    GROUP BY user_id
)
GROUP BY 
  user_id
HAVING COUNT(1) > 1

Otherwise you could have a duplicate user_id within your user table. Maybe one user has been in multiple departments? or changed names?

Find duplicated user_ids with the following:

SELECT
  user_id,
  COUNT(1) as duplicate_user_count
FROM user
GROUP BY user_id
HAVING COUNT(1) >1

Putting it all back together - find where the duplication is happening and then add other columns to you care about once resolved:

SELECT
    users.user_id,
    users.dupe_users,
    max_time.distinct_punchtimes,
    max_time.distinct_punchtypes,
    max_time.max_punchTime
FROM (
    SELECT
        user_id,
        COUNT(1) as dupe_users
    FROM user
    GROUP BY
      user_id 
) as users
INNER JOIN (
    SELECT
        user_id,
        COUNT(1) as clock_rows,
        COUNT(DISTINCT(punchTime)) as distinct_punchtimes,
        COUNT(DISTINCT(punchType)) as distinct_punchtypes,
        MAX(punchTime) max_punchTime
    FROM clock
    GROUP BY user_id
) as max_time
ON users.user_id = max_time.user_id

Upvotes: 0

Gordon Linoff
Gordon Linoff

Reputation: 1269503

Because different users can have the same punch time. One user's punchtime could be another users maximum punchtime. Here is one fix:

        SELECT *
        FROM clock
        WHERE (user_id, punchTime) IN (
                SELECT user_id, MAX(punchTime) 
                FROM clock
                GROUP BY user_id
               );

This could also be fixed with correlated subqueries and other methods.

Upvotes: 1

Related Questions