Héctor M.
Héctor M.

Reputation: 2392

Select ONLY the most common value in a table (if repeated)

I have three tables in MySQL: photos, tags and photo_tags with the following entries:

photos:

    ID |     NAME        | URL | CREATED |
    --------------------------------------------
     1 | Red flowers     | ... |   ...   |
    --------------------------------------------
     2 | Autumn trees    | ... |   ...   |
    --------------------------------------------
     3 | Rain in the city | ... |   ...   |
    --------------------------------------------

tags: (This table contains repeated tag names with some differences)

    ID |       NAME        |   URL   | CREATED |
    --------------------------------------------
     1 |      nature       |   ...   |   ...   |
    --------------------------------------------
     2 |      Nature       |   ...   |   ...   |
    --------------------------------------------
     3 |      NATURE       |   ...   |   ...   |
    --------------------------------------------
     5 |      summer       |   ...   |   ...   |
    --------------------------------------------
     4 |      Summer       |   ...   |   ...   |
    --------------------------------------------
     6 |      Ocean        |   ...   |   ...   |
    --------------------------------------------

photo_tags:

   PHOTO_ID | TAG_ID | CREATED |
   -----------------------------
       1    |    2   |   ...   |
   -----------------------------
       2    |    2   |   ...   |
   -----------------------------
       3    |    1   |   ...   |
   -----------------------------
       4    |    3   |   ...   |
   -----------------------------
       5    |    2   |   ...   |
   -----------------------------
       6    |    3   |   ...   |
   -----------------------------
       7    |    6   |   ...   |

As shown in the photo_tag table, the following tagIDs are used: 2: Nature, 1: nature, 3: NATURE and 6: Ocean.

If a tag is only used once (not duplicated) in the photo_tags table (for example Ocean) then I select it normally, but if it is repeated (for example Nature) then I want to select only the most used.

For example, if we rely on the photo_tags table and do a PHP var_dump(), it should be the following:

array(3) {

    // Duplicated tag, then select most common tag between (nature, Nature, NATURE)
    [0]=> array(1) { ["tag_name"]=> string(3) "Nature" }

    // Single tag, then select normally
    [1]=> array(1) { ["tag_name"]=> string(3) "Ocean" }

}

But my code:

 <?php

 $sql = ' SELECT t.`name`, COUNT( pt.`tagID` ) AS `pt_count` FROM `tags` t '.
        ' INNER JOIN `photo_tags` pt ON t.`tagID` = pt.`tagID` '.
        ' GROUP BY pt.`tagID` ORDER BY `pt_count` DESC ';

 $tags = [ ];
 $stmt = $this->db->prepare($sql);
 $stmt->execute();
 while($row = $stmt->fetch(\PDO::FETCH_ASSOC)) {
     $tags[] = [ 'tag_name' => $row['name'] ];
 }

It brings me all the tags whose tagID appears in the photo_tags table

Upvotes: 0

Views: 76

Answers (2)

GMB
GMB

Reputation: 222462

When a duplicated tag name exists, you want the most name only. Duplicated tags have the same name but different case (lower, upper, mixed).

If you are running MySQL 8.0, you can do this with a common table expression. You would first join tags with photo_tag and count the number of occurences per tag, and then filter in the outer query on the most occuring tag, independantly of the case.

with cte as (
    select t.name, count(*) cnt
    from tags t
    inner join photo_tags pt on pt.tag_id = t.id
    group by t.id, t.name
)
select name
from cte c
where cnt = (select max(c1.cnt) from cte c1 where lower(c1.name) = lower(c.name))

Demo on DB Fiddle:

| name   |
| :----- |
| Nature |
| Ocean  |

In earlier versions, the query gets longer. You can aggregate in a subquery, and then use a not exists conditition with a correlated aggregate query for filtering:

select name
from (
    select t.name, count(*) cnt
    from tags t
    inner join photo_tags pt on pt.tag_id = t.id
    group by t.id, t.name
) x
where not exists (
    select 1
    from tags t1
    inner join photo_tags pt1 on pt1.tag_id = t1.id
    where lower(t1.name) = lower(x.name)
    group by t1.id, t1.name
    having count(*) > x.cnt
)

Demo on DB Fiddlde

Upvotes: 1

Walf
Walf

Reputation: 9308

This sounds like an XY problem. To work around the issue you could join on a more complex grouped subquery, either using the GROUP_CONCAT() method, or window functions if your version of mysql is high enough.

I think the real issue is bad data. You're better off normalising it so the table has no duplicate tags to begin with. Then you can use a simple join. Either lowercase every tag, aggregate them, then put a unique key on that column, or Use LIKE instead of = when matching for updates and insertions to that table.

Upvotes: 1

Related Questions