john mossel
john mossel

Reputation: 2156

Complex(ish) Joins Question:

I have the following table in a mySQL database, (nb: these are summarised so they are relevant to this question, some columns have been omitted).

author (id, username, password etc.)
thread (id, title, content)
tag (id, name)
reply (id, content)
thread_replies (thread_id, reply_id)
author_replies (author_id, reply_id)
thread_tags (thread_id, tag_id)
author_threads (author_id, thread_id)

Now to get threads by a certain Author, I normally do this:

SELECT thread.title, thread.id AS thread_id, thread.content, author.username, author.id AS author_id
FROM thread
JOIN author_threads ON thread.id = author_threads.thread_id
JOIN author ON author_threads.author_id = author.id
WHERE author.id = '12'

This works fine, however when I try getting the tags associated with these threads:

SELECT thread.title, thread.id AS thread_id, thread.content, author.username, author.id AS author_id, GROUP_CONCAT( DISTINCT tag.name
ORDER BY tag.name DESC
SEPARATOR ',' ) AS tags
FROM thread
JOIN thread_tags ON thread.id = thread_tags.thread_id
JOIN tag ON thread_tags.tag_id = tag.id
JOIN author_threads ON thread.id = author_threads.thread_id
JOIN author ON author_threads.author_id = author.id
WHERE author.id = '12'
LIMIT 0 , 30

It only shows the first thread, and all the tags associated with this author in one column.

What am I doing wrong here?

Also If I wanted to count the number of replies the author got for each thread how would this be done?

Upvotes: 0

Views: 111

Answers (2)

Adam Robinson
Adam Robinson

Reputation: 185703

Because you're using an aggregate in your query (GROUP_CONCAT), your query is being grouped. Since you have no group by clause, your group is the entire result set (hence seeing every tag the author has used). Because MySQL allows for using non-grouped columns in grouped statements, you aren't getting an error, but you aren't getting the query that you want.

In order to retrieve the proper results, you need to group your query on thread.id.

select
    thread.title, 
    thread.id as thread_id, 
    thread.content, 
    author.username, 
    author.id as author_id, 
    group_concat(distinct tag.name order by tag.name desc separator ',') as tags

from thread

join thread_tags ON thread.id = thread_tags.thread_id
join tag ON thread_tags.tag_id = tag.id
join author_threads ON thread.id = author_threads.thread_id
join author ON author_threads.author_id = author.id

where author.id = '12'

group by thread.id

limit 0 , 30

This should work in MySQL, though it's not ANSI-compliant SQL since you're using non-grouped columns in the select clause without any aggregation. You can leave this as-is, or you can write more compliant SQL and use something like max around all of the columns other than thread.id. This won't look as pretty, but it will be compliant.

SELECT 
    max(thread.title) as title, 
    thread.id as thread_id, 
    max(thread.content) as content, 
    max(author.username) as username, 
    max(author.id) as author_id, 
    group_concat(distinct tag.name order by tag.name desc separator ',') as tags

from thread

join thread_tags ON thread.id = thread_tags.thread_id
join tag ON thread_tags.tag_id = tag.id
join author_threads ON thread.id = author_threads.thread_id
join author ON author_threads.author_id = author.id

where author.id = '12'

group by thread.id

LIMIT 0 , 30

Reply Count

The above query (along with your original queries) are suitable only for retrieving the tag list. You could write an equivalent query to retrieve the reply count (assuming that replies aren't nested, in which case you'll have to use whatever recursive querying capabilities that MySQL provides, which I'm not familiar with), but to retrieve both in a single query requires subqueries:

select
    thread.title, 
    thread.id as thread_id, 
    thread.content, 
    author.username, 
    author.id, 
    (select group_concat(distinct tag.name order by tag.name separator ',')

    from thread_tags

    join tag on tag.id = thread_tags.tag_id 

    where thread_tags.thread_id = thread.id) as tags,
    (select count(1) from thread_replies where thread_id = thread.id) as reply_count

from thread

join author_threads ON thread.id = author_threads.thread_id
join author ON author_threads.author_id = author.id

where author.id = '12'

LIMIT 0 , 30

I've removed the group by from this query because our aggregate has been moved into a subselect, meaning that the outer query is no longer grouped.

Upvotes: 2

DRapp
DRapp

Reputation: 48179

I would prequery the threads and tag info ONLY inside first... Then, that will already have the author and valid threads to join to whatever else you need...

If you DO want to apply a limit, put that on the INNER "PREQUERY" as that will become the basis for joining the outer level tables... Otherwise, you get the inner query of 100's or 1000's plus entries just to be joined to the other tables and chopped to 30 records... have IT stop at 30 and you're done..

You may also want to order by most current threads when limiting the entries returned too.

select STRAIGHT_JOIN
      PreQuery.*,
      Author.username,
      Thread.title,
      Thread.Content

   from 
      ( select STRAIGHT_JOIN
              author_threads.author_id,
              author_threads.thread_id,
              group_concat(distinct tag.name order by tag.name desc separator ',') as tags
           from 
              author_threads
                 join thread_tags
                    on author_threads.thread_id = thread_tags.thread_id
                    join tag
                       on thread_tags.tag_id = tag.id
           where
              author_threads.author_id = '12'
           group by
              author_threads.author_id,
              author_threads.thread_id
           limit 0, 30 ) PreQuery

      join author
         on PreQuery.Author_ID = author.id

      join thread
         on PreQuery.Thread_id = thread.id

Upvotes: 0

Related Questions