Reputation: 111060
I have two tables:
posts (id, published_at)
posts_images (id, post_id, image_url(null or url string))
Each post has at least 1 posts_images record and can have more than 1 as well.
My Goal: query shows me the % of posts that have 1 or more images, broken down by week (7 days) ago.
Here is my query:
SELECT floor(datediff(p.created_at, curdate()) / 7) AS weeks_ago,
date(min(p.created_at)) AS "Date Start",
date(max(p.created_at)) AS "Date End",
count(DISTINCT p.id) AS "Posts in Cohort"
count(pc.image_url) / count(p.id) AS "Post w 1 or more Images Ratio",
FROM posts p
INNER JOIN posts_images pc
ON p.id = pc.post_id
WHERE p.published_at IS NOT NULL
GROUP BY weeks_ago
ORDER BY weeks_ago DESC;
The query runs fine and outputs data but I'm not sure if I'm performing the JOIN correctly given that posts have 1 or more posts_images. I'm worried that SQL is picking the first posts_images record and not looking at all of them.
Am I doing this right?
Upvotes: 0
Views: 39
Reputation: 142518
I would start by finding cases of multiple images:
SELECT post_id, COUNT(*) AS ct
FROM posts_images
GROUP BY post_id
HAVING ct > 1
Then I would go to posts
to find what week is involved:
SELECT floor(datediff(p.created_at, curdate()) / 7) AS weeks_ago
date(min(p.created_at)) AS "Date Start",
date(max(p.created_at)) AS "Date End",
count(*) AS "Posts in Cohort"
ROUND(SUM(x.ct) / count(*), 3) AS "Post w 1 or more Images Ratio",
FROM ( .. the query above .. ) AS x
JOIN posts AS p ON x.post_id = p.id
GROUP BY weeks_ago
ORDER BY weeks_ago DESC;
The advantage over your approach is that the intermediate temp table is smaller (one row per post vs one row per image).
Potential problem:
FLOOR
to get start/end of the "week".LEFT JOIN
.Upvotes: 1
Reputation: 1270773
I think you are better off with two levels of aggregation:
SELECT floor(datediff(p.created_at, curdate()) / 7) AS weeks_ago,
date(min(p.created_at)) AS "Date Start",
date(max(p.created_at)) AS "Date End",
count(*) as "Posts in Cohort",
avg(has_image) as "Post w 1 or more Images Ratio",
FROM (SELECT p.id, p.created_at,
( MAX(pi.image_url) IS NOT NULL ) as has_image
FROM posts p JOIN
posts_images pi
ON p.id = pi.post_id
WHERE p.published_at IS NOT NULL
GROUP BY p.id
) p
GROUP BY weeks_ago
ORDER BY weeks_ago DESC;
Upvotes: 3