Phaelax z
Phaelax z

Reputation: 2009

Get columns associated with aggregate results involving multiple tables

Think web forum structure here. My current query gets the timestamps from the oldest and newest posts associated with each thread. What I can't seem to figure out is how to get the userid associated with those posts. I read through similar questions on here, but they had simpler structures where all the aggregated data could be grouped back to a single column. Do I need to break down the min, max, and count into 3 separate subqueries? I tried looking at adding userid to the group clause but that only duplicates the data for each user who owns a post in the threads. I'm at a loss on what direction to go from here.

https://www.db-fiddle.com/f/akG5WjvoBFPzzKXSBhF3Zz/0

Dummy tables:

CREATE TABLE post (id int NOT NULL AUTO_INCREMENT, postDate datetime NOT NULL DEFAULT CURRENT_TIMESTAMP, userid int NOT NULL, threadId int NOT NULL, PRIMARY KEY(id));
INSERT INTO post (postDate, userid, threadId) VALUES("2025-3-1 16:30:00", 1, 7);
INSERT INTO post (postDate, userid, threadId) VALUES("2025-3-1 17:57:00", 2, 7);
INSERT INTO post (postDate, userid, threadId) VALUES("2025-3-1 19:23:00", 3, 7);

CREATE TABLE users (id int NOT NULL AUTO_INCREMENT, name varchar(50), PRIMARY KEY(id));
INSERT INTO users (id, name) VALUES(1, "Bob");
INSERT INTO users (id, name) VALUES(2, "Steve");
INSERT INTO users (id, name) VALUES(3, "Mary");

CREATE TABLE thread (id int NOT NULL AUTO_INCREMENT, title varchar(100), PRIMARY KEY(id));
INSERT INTO thread (id, title) VALUES(5, "Another thread");
INSERT INTO thread (id, title) VALUES(6, "Some foxy stuff");
INSERT INTO thread (id, title) VALUES(7, "This is a test");

Query:

SELECT t.id, t.title, x.pFirst, x.pLast, x.postCount, x.userid
FROM thread t
LEFT JOIN (
    SELECT threadId, MIN(postDate) pFirst, MAX(postDate) pLast, count(p.id) as postCount
    FROM post p
    GROUP BY threadId
) x ON x.threadId = t.id
ORDER BY pLast DESC

Current Result:

id title pFirst pLast postCount
7 This is a test 2025-03-01 16:30:00 2025-03-01 19:23:00 3
5 Another thread null null null
6 Some foxy stuff null null null

Result I'd like to get:

id title pFirst uFirst pLast uLast postCount
7 This is a test 2025-03-01 16:30:00 1 2025-03-01 19:23:00 3 3
5 Another thread null null null
6 Some foxy stuff null null null

Upvotes: 0

Views: 52

Answers (1)

ValNik
ValNik

Reputation: 5916

You can detect first and last post with window functions row_number() and count().
Row row_number()=1 is first and row_number()=count(
)(...) is last.

Then this 2 rows for every thread pivot by conditional aggregation.

Also classic query example (see bottom of answer) works well with index.

See example

with first_last as(
  select threadId,cnt
     ,min(case when rn=1 then userId end) uFirst
     ,min(case when rn=cnt then userId end) uLast
     ,min(case when rn=1 then postDate end) pFirst
     ,min(case when rn=cnt then postDate end) pLast
  from(
    select *
      ,row_number()over(partition by threadId order by postDate)rn
      ,count(*)over(partition by threadId)cnt
    from post
  )q
  where rn=1 or rn=cnt
  group by threadId
)
select t.*, uFirst,pFirst,uLast,pLast,cnt as postCount
from thread t
left join first_last p on p.threadid=t.Id

id title uFirst pFirst uLast pLast postCount
5 Another thread null null null null null
6 Some foxy stuff null null null null null
7 This is a test 1 2025-03-01 16:30:00 3 2025-03-01 19:23:00 3

fiddle

Let's consider another example. If CTE result is materialized (this 2 row per thread output), performance will be better.

with first_last as(
  select *
  from(
    select *
      ,row_number()over(partition by threadId order by postDate)rn
      ,count(*)over(partition by threadId)cnt
    from post
  )q
  where rn=1 or rn=cnt
)
select t.*
  ,p1.userId as uFirst,p1.PostDate as pFirst
  ,p2.userId as uLast,p2.PostDate as pLast
  ,p1.cnt postCount
from thread t
left join first_last p1 on p1.threadid=t.Id and p1.rn=1
left join first_last p2 on p2.threadid=t.Id and p2.rn=p2.cnt

Update1
For mySql version 5.7 (and other versions - no specific functions)

select t.*
  ,p1.userId as uFirst,p1.PostDate as pFirst
  ,p2.userId as uLast,p2.PostDate as pLast
  ,fl.cnt postCount
from thread t
left join (
    select threadId,min(postDate)pFirst,max(postDate)pLast,count(*) cnt
    from post
    group by threadId
 ) fl on fl.threadid=t.Id 
left join post p1 on p1.threadid=t.Id and p1.postDate=fl.pFirst
left join post p2 on p2.threadid=t.Id and p2.postDate=fl.pLast

By the way, it's not bad in terms of performance.
Index ix_posts_thread_postdate on post (threadId,postDate) match to this query.
model fiddle
and for MariaDb

Upvotes: 2

Related Questions