Reputation: 325

Self Join to find duplicates but including all columns

I would like to match any entries in a logs table which occurs having the same day and cause which are in the table more than once. I wrote the query for that fetches already the duplicates, my problem is that I need to have access to all the columns in the result from the table for later JOINs. Table looks like that:

| ID | DATE       | CAUSE | USER | ... |
|--------------------------------------|
| x  | 2017-01-01 | aaa   | 100  | ... |
| x  | 2017-01-02 | aaa   | 101  | ... |
| x  | 2017-01-03 | bbb   | 101  | ... |
| x  | 2017-01-03 | bbb   | 101  | ... |
| x  | 2017-01-04 | ccc   | 101  | ... |
| x  | 2017-01-04 | ccc   | 101  | ... |
| x  | 2017-01-04 | ccc   | 101  | ... |
| x  | 2017-01-05 | aaa   | 101  | ... |
| .....................................|
| .....................................|
| .....................................|

Query:

SELECT logs.* FROM 
    (SELECT day, cause FROM logs 
         GROUP BY day, cause HAVING COUNT(*) > 1) AS logsTwice, logs 
WHERE logsTwice.day = logs.day AND logsTwice.cause = logs.cause

The sub select fetches exactly the right data (date and cause) but when I try to get the additional columns of these matches I get completely wrong data. What am I doing wrong?

Upvotes: 0

Answers (3)

batsz

Reputation: 65

You can try

SELECT l1.*
  FROM logs l1
 INNER JOIN logs l2
    ON (l1.id <> l2.id
        AND l1.day = l2.day
        AND l1.cause = l2.cause
        AND l1.user <> l2.user);

Upvotes: 0

Gordon Linoff

Reputation: 1270553

You can just use window functions:

SELECT l.*
FROM (SELECT l.*,
             COUNT(*) OVER (PARTITION BY day, cause) as cnt
      FROM logs l
     ) l
WHERE cnt > 1;

In general, window functions will have better performance than the equivalent query using JOIN and GROUP BY.

Upvotes: 2

Oto Shavadze

Reputation: 42803

Try this:

SELECT logs.* FROM logs
inner join 
(SELECT day, cause FROM logs GROUP BY day, cause HAVING COUNT(*) > 1) logsTwice
on logsTwice.day = logs.day AND logsTwice.cause = logs.cause

Upvotes: 1

Self Join to find duplicates but including all columns

Answers (3)

Related Questions