Run
Run

Reputation: 57266

GROUP a result BY a specific keyword in MySQL?

I have a page tagged with multiple tags with the keyword I am searching and sometimes it is not tagged with that keyword, so when it has that tags, it will return a result like this below,

query,

SELECT*
FROM root_pages AS p

LEFT JOIN root_mm_pages_tags AS mm
ON mm.page_id = p.page_id

LEFT JOIN root_tags AS t
ON t.tag_id =  mm.tag_id
AND t.tag_name LIKE '%story%'

WHERE p.page_title LIKE '%article title 8%'
AND p.page_hide != '1'

ORDER BY (t.tag_name+0) ASC

result,

page_id     page_url            tag_name    
17          article title 8     NULL
17          article title 8     NULL
17          article title 8     sys-rsv-story-1

so I have to use GROUP BY to solve this problem,

SELECT*
FROM root_pages AS p

LEFT JOIN root_mm_pages_tags AS mm
ON mm.page_id = p.page_id

LEFT JOIN root_tags AS t
ON t.tag_id =  mm.tag_id
AND t.tag_name LIKE '%story%'

WHERE p.page_title LIKE '%article title 8%'
AND p.page_hide != '1'

GROUP BY p.page_id
ORDER BY (t.tag_name+0) ASC

and it returns something like this,

page_id     page_url            tag_name    
17          article title 8     NULL

But I am after this result which it has the keyword that I am searching for,

page_id     page_url            tag_name    
17          article title 8     sys-rsv-story-1

So, is it possible to group a result by a keyword? Or other better queries to archive this?

Also, it should not return the result if that keyword isn't there, but it still does,

page_id     page_url            tag_name    
    17          article title 8     NULL
    17          article title 8     NULL

EDIT:

My new solution,

 SELECT*
FROM root_pages AS p

INNER JOIN root_mm_pages_tags AS mm
ON mm.page_id = p.page_id

INNER JOIN root_tags AS t
ON t.tag_id =  mm.tag_id

WHERE p.page_title LIKE '%{group1}%'
AND t.tag_name LIKE '%story%'
AND p.page_hide != '1'

AND EXISTS (
    SELECT page_url
    FROM root_pages AS p

    LEFT JOIN root_mm_pages_tags AS mm
    ON mm.page_id = p.page_id

    LEFT JOIN root_tags AS t
    ON t.tag_id =  mm.tag_id

    WHERE page_url = 'article title 1d'
    AND t.tag_name LIKE '%story%'
    AND p.page_hide != '1'
)

ORDER BY (t.tag_name+0) ASC

Upvotes: 3

Views: 553

Answers (3)

regilero
regilero

Reputation: 30526

Oucha.

Your SQl queries are quite strange I think.

Several thing to notice:

  • using bar LIKE '%foo%' is very hard for the SQL engine, he must sequentially scan all rows and search the substring 'foo' in the column bar. Index usage is not available. So avoid it if you can. Use at least bar LIKE 'foo%' if you can (index available if you have the start). And in you case you could have pages with a title 'article title 80' matching, are you sure you do not simply need a p.page_title = 'article title 8'?
  • why do you make a +0 in the order by instruction? Do you really want to prevent index usage?
  • p.page_hide != '1', p.page_hide isn't a tinyint? it's a string? why using UTF8 encoded characters to store 0 or 1?

But this is not the problem.

One of your problem is that using a group by GROUP BY p.page_id is in fact wrong in SQL but MySQL hides this fact. A group by instruction should contain at least every element which is not an aggegate in the SELECT part (an aggregate is count or sum, or avg, etc). Here you group by id and you obtain a random thing, MySQL thinks you know what you're doing and you're sure every other field in the select is the same when the id is the same (which is not the case, the tag_name differs).

And if you have several tags matching your keyword ('story' here) don't you want the page to be listed several times? with all tags?

So.

You want to select a page, where you have a tag. I would say use EXISTS keyword and make things simplier.

It may be something like that:

SELECT * 
 FROM root_pages AS p
WHERE p.page_title = 'article title 8'
 AND p.page_hide != 1
 -- exists will return true as soon as the engine find one matching row
 AND EXISTS (
  SELECT mm.page_id
  FROM root_mm_pages_tags AS mm
    LEFT JOIN root_tags AS t
      ON t.tag_id =  mm.tag_id
  -- here we make a correlation between the subquery and the main query
  WHERE mm.page_id = p.page_id
  AND t.tag_name LIKE '%story%'
)

But with this query you only obtain the page name, not the tag result. And if you want to list all matching tags for a page you need another query, quite near of what you have:

SELECT p.page_id, p.page_name, t.tag_name
 FROM root_pages AS p
   INNER JOIN root_mm_pages_tags AS mm
       ON mm.page_id = p.page_id
     INNER JOIN root_tags AS t
         ON (t.tag_id =  mm.tag_id 
         AND t.tag_name LIKE '%story%')
WHERE p.page_title = 'article title 8'
 AND p.page_hide != 1

With the first INNER JOIN I'm only keeping pages which have tags. With the second INNER JOIN I'm only keeping rows from root_mm_pages having a matching tag in root_tags. I think your NULL came from rows in this tables linked to other unmatching tags (so having NULL field in root_tags table result for you query). So do not use LEFT JOIN if you only want matchings results.

If you want only one result for each table you will need a GROUP BY p.page_id, p.page_name and you will need to add an aggregate function on the remaining field t.tag_name. You could use GROUP_CONTACT(t.tag_name ORDER BY t.tag_name ASC SEPARATOR ",") to obtain a list of all matching tags for this table.

EDIT

So it seems in fact you want pages with matching title OR pages with matching keyword. In this case you should use LEFT JOIN, and you will have NULL values. If you do not need the tag in the result the EXISTS keyword is still your best friend, just replace the AND EXISTS with OR EXISTS. It's the fastest solution.

If you need the matching tags in the result or NULL when they were no tags you have 2 solutions. A UNION query mixing results from a simple query on titles and a query on tags with inner joins, or doing the nice group by with GROUP_CONCAT. If you do not use GROUP_CONCAT (as in @Dmitry Teplyakov answer) you will maybe obtain results where the page title did'nt match, only the keyword, but the tag_name field will show NULL as the first tag_row listed before the GROUP BY is applied on the query is a NULL field -- the page as 3 keywords, the matching keyword is not the first in the query --.

SELECT 
 p.page_id,
 p.page_name,
 GROUP_CONCAT(t.tag_name ORDER BY t.tag_name ASC SEPARATOR ",")
FROM root_pages AS p
   LEFT JOIN root_mm_pages_tags AS mm
       ON mm.page_id = p.page_id
     LEFT JOIN root_tags AS t
         ON t.tag_id =  mm.tag_id 
WHERE p.page_hide != 1
 AND (p.page_title = 'article title 8'
  OR t.tag_name LIKE '%story%')
GROUP BY p.page_id, p.page_name;

But here we loose your order by tag_name. Ordering by tag_name means you wants the same page appearing in several rows if it is matching the keyword several times. Or if the name is matching and the keyword also... or maybe not. So in fact the UNION query solution is maybe better. But the key point is you should explain what you want in the tag_name field :-)

Upvotes: 1

Dmitry Teplyakov
Dmitry Teplyakov

Reputation: 2908

Try to not use condition in LEFT JOIN:

SELECT *
FROM root_pages AS p

LEFT JOIN root_mm_pages_tags AS mm
ON mm.page_id = p.page_id

LEFT JOIN root_tags AS t
ON t.tag_id =  mm.tag_id

WHERE p.page_title LIKE '%article title 8%'
AND p.page_hide != '1'
AND t.tag_name LIKE '%story%'

GROUP BY p.page_id
ORDER BY (t.tag_name+0) ASC

EDIT: If you want to fetch rows with page title contains 'article title' and rows that have not that titles but have needed keyword, use this query (As @user985935 suggested):

SELECT *
FROM root_pages AS p

LEFT JOIN root_mm_pages_tags AS mm
ON mm.page_id = p.page_id

LEFT JOIN root_tags AS t
ON t.tag_id =  mm.tag_id

WHERE (p.page_title LIKE '%article title 8%'
OR t.tag_name LIKE '%story%')
AND p.page_hide != '1'


GROUP BY p.page_id
ORDER BY (t.tag_name+0) ASC

Upvotes: 2

Christopher Pelayo
Christopher Pelayo

Reputation: 802

here is the sample query I'm mentioning on the comment:

SELECT *
FROM root_pages AS p

LEFT JOIN root_mm_pages_tags AS mm
ON mm.page_id = p.page_id

LEFT JOIN root_tags AS t
ON t.tag_id =  mm.tag_id

WHERE p.page_hide != '1'
AND (t.tag_name LIKE '%story%' OR p.page_title LIKE '%article title 8%')
GROUP BY p.page_id
ORDER BY (t.tag_name+0) ASC

Upvotes: 0

Related Questions