Reputation: 2392
I have three tables in MySQL: photos
, tags
and photo_tags
with the following entries:
photos:
ID | NAME | URL | CREATED |
--------------------------------------------
1 | Red flowers | ... | ... |
--------------------------------------------
2 | Autumn trees | ... | ... |
--------------------------------------------
3 | Rain in the city | ... | ... |
--------------------------------------------
tags: (This table contains repeated tag names with some differences)
ID | NAME | URL | CREATED |
--------------------------------------------
1 | nature | ... | ... |
--------------------------------------------
2 | Nature | ... | ... |
--------------------------------------------
3 | NATURE | ... | ... |
--------------------------------------------
5 | summer | ... | ... |
--------------------------------------------
4 | Summer | ... | ... |
--------------------------------------------
6 | Ocean | ... | ... |
--------------------------------------------
photo_tags:
PHOTO_ID | TAG_ID | CREATED |
-----------------------------
1 | 2 | ... |
-----------------------------
2 | 2 | ... |
-----------------------------
3 | 1 | ... |
-----------------------------
4 | 3 | ... |
-----------------------------
5 | 2 | ... |
-----------------------------
6 | 3 | ... |
-----------------------------
7 | 6 | ... |
As shown in the photo_tag
table, the following tagIDs are used: 2: Nature
, 1: nature
, 3: NATURE
and 6: Ocean
.
If a tag is only used once (not duplicated) in the photo_tags
table (for example Ocean) then I select it normally, but if it is repeated (for example Nature) then I want to select only the most used.
For example, if we rely on the photo_tags
table and do a PHP var_dump()
, it should be the following:
array(3) {
// Duplicated tag, then select most common tag between (nature, Nature, NATURE)
[0]=> array(1) { ["tag_name"]=> string(3) "Nature" }
// Single tag, then select normally
[1]=> array(1) { ["tag_name"]=> string(3) "Ocean" }
}
But my code:
<?php
$sql = ' SELECT t.`name`, COUNT( pt.`tagID` ) AS `pt_count` FROM `tags` t '.
' INNER JOIN `photo_tags` pt ON t.`tagID` = pt.`tagID` '.
' GROUP BY pt.`tagID` ORDER BY `pt_count` DESC ';
$tags = [ ];
$stmt = $this->db->prepare($sql);
$stmt->execute();
while($row = $stmt->fetch(\PDO::FETCH_ASSOC)) {
$tags[] = [ 'tag_name' => $row['name'] ];
}
It brings me all the tags whose tagID appears in the photo_tags
table
Upvotes: 0
Views: 76
Reputation: 222462
When a duplicated tag name exists, you want the most name only. Duplicated tags have the same name but different case (lower, upper, mixed).
If you are running MySQL 8.0, you can do this with a common table expression. You would first join tags
with photo_tag
and count the number of occurences per tag, and then filter in the outer query on the most occuring tag, independantly of the case.
with cte as (
select t.name, count(*) cnt
from tags t
inner join photo_tags pt on pt.tag_id = t.id
group by t.id, t.name
)
select name
from cte c
where cnt = (select max(c1.cnt) from cte c1 where lower(c1.name) = lower(c.name))
| name | | :----- | | Nature | | Ocean |
In earlier versions, the query gets longer. You can aggregate in a subquery, and then use a not exists conditition with a correlated aggregate query for filtering:
select name
from (
select t.name, count(*) cnt
from tags t
inner join photo_tags pt on pt.tag_id = t.id
group by t.id, t.name
) x
where not exists (
select 1
from tags t1
inner join photo_tags pt1 on pt1.tag_id = t1.id
where lower(t1.name) = lower(x.name)
group by t1.id, t1.name
having count(*) > x.cnt
)
Upvotes: 1
Reputation: 9308
This sounds like an XY problem. To work around the issue you could join on a more complex grouped subquery, either using the GROUP_CONCAT()
method, or window functions if your version of mysql is high enough.
I think the real issue is bad data. You're better off normalising it so the table has no duplicate tags to begin with. Then you can use a simple join. Either lowercase every tag, aggregate them, then put a unique key on that column, or Use LIKE
instead of =
when matching for updates and insertions to that table.
Upvotes: 1