AnApprentice
AnApprentice

Reputation: 111060

How to join an associated table when the joined table can have one or more records?

I have two tables:

posts (id, published_at)
posts_images (id, post_id, image_url(null or url string))

Each post has at least 1 posts_images record and can have more than 1 as well.

My Goal: query shows me the % of posts that have 1 or more images, broken down by week (7 days) ago.

Here is my query:

SELECT  floor(datediff(p.created_at, curdate()) / 7) AS weeks_ago,
        date(min(p.created_at)) AS "Date Start",
        date(max(p.created_at)) AS "Date End",
        count(DISTINCT p.id) AS "Posts in Cohort"
        count(pc.image_url) / count(p.id) AS "Post w 1 or more Images Ratio",
FROM    posts p
        INNER JOIN posts_images pc
            ON p.id = pc.post_id
WHERE p.published_at IS NOT NULL
GROUP BY weeks_ago
ORDER BY weeks_ago DESC;

The query runs fine and outputs data but I'm not sure if I'm performing the JOIN correctly given that posts have 1 or more posts_images. I'm worried that SQL is picking the first posts_images record and not looking at all of them.

Am I doing this right?

Upvotes: 0

Views: 39

Answers (2)

Rick James
Rick James

Reputation: 142518

I would start by finding cases of multiple images:

SELECT post_id, COUNT(*) AS ct
    FROM posts_images
    GROUP BY post_id
    HAVING ct > 1

Then I would go to posts to find what week is involved:

SELECT  floor(datediff(p.created_at, curdate()) / 7) AS weeks_ago
        date(min(p.created_at)) AS "Date Start",
        date(max(p.created_at)) AS "Date End",
        count(*) AS "Posts in Cohort"
        ROUND(SUM(x.ct) / count(*), 3) AS "Post w 1 or more Images Ratio",
    FROM ( .. the query above .. ) AS x
    JOIN posts AS p  ON x.post_id = p.id
    GROUP BY weeks_ago
    ORDER BY weeks_ago DESC;

The advantage over your approach is that the intermediate temp table is smaller (one row per post vs one row per image).

Potential problem:

  • Date Start-End may not be exactly days. This can be fixed by working backwards from the FLOOR to get start/end of the "week".
  • Missing weeks. This would require another table with all the weeks, plus a messy LEFT JOIN.

Upvotes: 1

Gordon Linoff
Gordon Linoff

Reputation: 1270773

I think you are better off with two levels of aggregation:

SELECT  floor(datediff(p.created_at, curdate()) / 7) AS weeks_ago,
        date(min(p.created_at)) AS "Date Start",
        date(max(p.created_at)) AS "Date End",
        count(*) as  "Posts in Cohort",
        avg(has_image) as "Post w 1 or more Images Ratio",
FROM (SELECT p.id, p.created_at,
             ( MAX(pi.image_url) IS NOT NULL ) as has_image
      FROM posts p JOIN
           posts_images pi
           ON p.id = pi.post_id
      WHERE p.published_at IS NOT NULL
      GROUP BY p.id
     ) p
GROUP BY weeks_ago
ORDER BY weeks_ago DESC;

Upvotes: 3

Related Questions