Reputation: 53
I have a table that looks something like this but with lots of rows:
+-----------+-------------+------------+-------+-----------+-------+---------+--------+---------+
| systemid | colcitrixid | colleague | time | date | week | period | year | usern |
| 32855671 | nameu02 | somename | 10:12 | 2019-11-01| 23 | 5 | 2019 | othname |
|
|
| ... | ... | ... | ... | ... | ... | ... | ... | ... |
+-----------+-------------+------------+-------+-----------+-------+---------+--------+---------+
(column names summaried and some missed out so it fits in this stackoverflow editor width.)
I would like to run a query to get a list of duplicate rows where the systemid, the colcitrixid and the date are all the same so I can put them into another table to work on.
Here is what I have so far:
SELECT
systemid,
colleaguecitrixid,
colleague,
TIME,
DATE,
WEEK,
period,
fincyear,
username,
COUNT(systemid) AS mids,
COUNT(colleaguecitrixid) AS colleagues,
GROUP_CONCAT(DISTINCT id SEPARATOR ',') AS ids
FROM
table_name_2019
WHERE
period = '9' AND calltype = 'Advice' AND enabled = '1' AND isarchived = '0' AND DATE <> '$todaysdate'
GROUP BY
systemid,
colleague,
DATE
HAVING
COUNT(systemid) > 1 AND COUNT(colleaguecitrixid) > 1
It almost works, but I'm missing something, because it is showing duplicates that include different colleaguecitrixid entries. It's doing it's result based on these rows from the orignal table like so
+-----------+-------------+------------+-------+-----------+-------+---------+--------+---------+
| systemid | colcitrixid | colleague | time | date | week | period | year | usern |
| 32852671 | nameu02 | Jack | 10:12 | 2019-11-01| 23 | 5 | 2019 | othname |
| 32852671 | nameu02 | Jack | 10:14 | 2019-11-01| 23 | 5 | 2019 | name5 |
| 32852671 | nameu09 | James | 10:14 | 2019-11-01| 23 | 5 | 2019 | name5 |
| 34855673 | nameu05 | Bob | 11:18 | 2019-11-03| 23 | 5 | 2019 | name2 |
| 34855673 | nameu05 | Bob | 11:18 | 2019-11-03| 23 | 5 | 2019 | othname |
| 32851672 | nameu08 | Sarah | 13:17 | 2019-11-01| 23 | 5 | 2019 | name2 |
| 32851672 | nameu08 | Sarah | 13:19 | 2019-11-01| 23 | 5 | 2019 | name3 |
| 32851672 | nameu08 | Sarah | 13:21 | 2019-11-01| 23 | 5 | 2019 | name1 |
| 32855671 | nameu06 | Jim | 10:19 | 2019-11-01| 23 | 5 | 2019 | othname |
| 32855671 | nameu06 | Jim | 10:22 | 2019-11-01| 23 | 5 | 2019 | othname |
| ... | ... | ... | ... | ... | ... | ... | ... | ... |
+-----------+-------------+------------+-------+-----------+-------+---------+--------+---------+
(Note, row 3, where we see James, a different colleague, but the system id is the same as the rows above with Jack)
but I would like it to only include the ones that are duplicated systemid AND colleaguecitrixid AND date, grouped together a bit like so:
+-----------+-------------+------------+-------+-----------+-------+---------+--------+---------+
| systemid | colcitrixid | colleague | time | date | week | period | year | usern |
| 32852671 | nameu02 | Jack | 10:12 | 2019-11-01| 23 | 5 | 2019 | othname |
| 32852671 | nameu02 | Jack | 10:14 | 2019-11-01| 23 | 5 | 2019 | name5 |
| 34855673 | nameu05 | Bob | 11:18 | 2019-11-03| 23 | 5 | 2019 | name2 |
| 34855673 | nameu05 | Bob | 11:18 | 2019-11-03| 23 | 5 | 2019 | othname |
| 32851672 | nameu08 | Sarah | 13:17 | 2019-11-01| 23 | 5 | 2019 | name2 |
| 32851672 | nameu08 | Sarah | 13:19 | 2019-11-01| 23 | 5 | 2019 | name3 |
| 32851672 | nameu08 | Sarah | 13:21 | 2019-11-01| 23 | 5 | 2019 | name1 |
| 32855671 | nameu06 | Jim | 10:19 | 2019-11-01| 23 | 5 | 2019 | othname |
| 32855671 | nameu06 | Jim | 10:22 | 2019-11-01| 23 | 5 | 2019 | othname |
| ... | ... | ... | ... | ... | ... | ... | ... | ... |
+-----------+-------------+------------+-------+-----------+-------+---------+--------+---------+
(I'm showing the rows from the orignal table the result seems to be drawing from above, I understand the result of my query is not formatted like the above, I need the other parts and format of the query for later steps.)
Upvotes: 1
Views: 61
Reputation: 65228
If your DB's version is 10.2+, then consider using
COUNT(colleaguecitrixid) OVER (PARTITION BY systemid, colleague, date)
analytic function :
SELECT systemid,
colleaguecitrixid,
colleague,
time,
date,
week,
period,
fincyear,
username,
mids,
colleagues,
ids
FROM
(
SELECT t.*,
COUNT(systemid) AS mids,
COUNT(colleaguecitrixid) OVER (PARTITION BY systemid, colleague, date) AS colleagues,
GROUP_CONCAT(DISTINCT id SEPARATOR ',') AS ids
FROM table_name_2019 t
WHERE period = '9'
AND calltype = 'Advice'
AND enabled = '1'
AND isarchived = '0'
AND DATE <> '$todaysdate'
) t1
WHERE colleagues > 1
Upvotes: 0
Reputation: 1269753
systemid, the colcitrixid and the date are all the same
The query should look like this:
SELECT systemid, colcitrixid, date, COUNT(*),
GROUP_CONCAT(DISTINCT id SEPARATOR ',') AS ids
FROM table_name_2019
WHERE period = '9' AND
calltype = 'Advice' AND
enabled = '1' AND isarchived = '0' AND
DATE <> CURDATE()
GROUP BY systemid, colcitrixid, date
HAVING COUNT(*) > 1;
I don't see colcitrixid
in the data. If you really mean colleaguecitrixid
, then use that.
Note:
GROUP BY
should be arguments to aggregation columns.DISTINCT
in GROUP_CONCAT()
is probably unnecessary. A column called id
should not have duplicates.Upvotes: 1