Select ONLY the most common value in a table (if repeated)

Question

I have three tables in MySQL: photos, tags and photo_tags with the following entries:

photos:

    ID |     NAME        | URL | CREATED |
    --------------------------------------------
     1 | Red flowers     | ... |   ...   |
    --------------------------------------------
     2 | Autumn trees    | ... |   ...   |
    --------------------------------------------
     3 | Rain in the city | ... |   ...   |
    --------------------------------------------

tags: (This table contains repeated tag names with some differences)

    ID |       NAME        |   URL   | CREATED |
    --------------------------------------------
     1 |      nature       |   ...   |   ...   |
    --------------------------------------------
     2 |      Nature       |   ...   |   ...   |
    --------------------------------------------
     3 |      NATURE       |   ...   |   ...   |
    --------------------------------------------
     5 |      summer       |   ...   |   ...   |
    --------------------------------------------
     4 |      Summer       |   ...   |   ...   |
    --------------------------------------------
     6 |      Ocean        |   ...   |   ...   |
    --------------------------------------------

photo_tags:

   PHOTO_ID | TAG_ID | CREATED |
   -----------------------------
       1    |    2   |   ...   |
   -----------------------------
       2    |    2   |   ...   |
   -----------------------------
       3    |    1   |   ...   |
   -----------------------------
       4    |    3   |   ...   |
   -----------------------------
       5    |    2   |   ...   |
   -----------------------------
       6    |    3   |   ...   |
   -----------------------------
       7    |    6   |   ...   |

As shown in the photo_tag table, the following tagIDs are used: 2: Nature, 1: nature, 3: NATURE and 6: Ocean.

If a tag is only used once (not duplicated) in the photo_tags table (for example Ocean) then I select it normally, but if it is repeated (for example Nature) then I want to select only the most used.

For example, if we rely on the photo_tags table and do a PHP var_dump(), it should be the following:

array(3) {

    // Duplicated tag, then select most common tag between (nature, Nature, NATURE)
    [0]=> array(1) { ["tag_name"]=> string(3) "Nature" }

    // Single tag, then select normally
    [1]=> array(1) { ["tag_name"]=> string(3) "Ocean" }

}

But my code:

 db->prepare($sql);
 $stmt->execute();
 while($row = $stmt->fetch(\PDO::FETCH_ASSOC)) {
     $tags[] = [ 'tag_name' => $row['name'] ];
 }

It brings me all the tags whose tagID appears in the photo_tags table

GMB · Accepted Answer

When a duplicated tag name exists, you want the most name only. Duplicated tags have the same name but different case (lower, upper, mixed).

If you are running MySQL 8.0, you can do this with a common table expression. You would first join tags with photo_tag and count the number of occurences per tag, and then filter in the outer query on the most occuring tag, independantly of the case.

with cte as (
    select t.name, count(*) cnt
    from tags t
    inner join photo_tags pt on pt.tag_id = t.id
    group by t.id, t.name
)
select name
from cte c
where cnt = (select max(c1.cnt) from cte c1 where lower(c1.name) = lower(c.name))

Demo on DB Fiddle:

| name   |
| :----- |
| Nature |
| Ocean  |

In earlier versions, the query gets longer. You can aggregate in a subquery, and then use a not exists conditition with a correlated aggregate query for filtering:

select name
from (
    select t.name, count(*) cnt
    from tags t
    inner join photo_tags pt on pt.tag_id = t.id
    group by t.id, t.name
) x
where not exists (
    select 1
    from tags t1
    inner join photo_tags pt1 on pt1.tag_id = t1.id
    where lower(t1.name) = lower(x.name)
    group by t1.id, t1.name
    having count(*) > x.cnt
)

Demo on DB Fiddlde

Select ONLY the most common value in a table (if repeated)

Answers (2)

Related Questions