Reputation: 1272
I have a huge database containing over 10 million entries.
I would like to find duplicates in that table and therefore I am using the following statement:
SELECT *
FROM `customer_positions_all`
WHERE `position_id` IN (
SELECT `position_id`
FROM `customer_positions_all`
GROUP BY `position_id`
HAVING count(`position_id`) > 1
)
ORDER BY `position_id`
I also have a "date" column, and I would like to only find the dups which are higher than a certain date (for example: date
> '2014-04-01').
I am not sure where do I put it inside the query... I keep getting errors.
Upvotes: 0
Views: 830
Reputation: 21757
Add the WHERE
clause before GROUP BY
to use it with a GROUP BY
. However, you can just do the filtering in the outer query, to filter from the list of IDs which are known to have duplicates like so:
SELECT *
FROM `customer_positions_all`
WHERE `position_id` IN (
SELECT `position_id`
FROM `customer_positions_all`
--WHERE date > '2014-04-01' --Redundant!
GROUP BY `position_id`
HAVING count(`position_id`) > 1
)
AND date > '2014-01-01'
ORDER BY `position_id`
Upvotes: 1