Michael Ziluck
Michael Ziluck

Reputation: 655

Combine a CROSS JOIN and a LEFT JOIN

I have two tables named author and commit_metrics. Both of them have an id field. Author has author_name and author_email. Commit_metrics has author_id and author_date.

I am trying to write a query that will get the number of commits that each author had in a given week, even if that number is 0. Here's what I have so far:

SELECT a.id, a.author_name, a.author_email, c.week_num, COUNT(c.id)
FROM author AS a
     CROSS JOIN generate_series(1, 610) AS s(n)
     LEFT JOIN  (SELECT c.id,
                        c.author_id,
                        c.author_date,
                        WEEK_NUMBER(c.author_date) AS week_num
                 FROM commit_metrics c) AS c ON s.n = c.week_num AND a.id = c.author_id
WHERE c.week_num IS NOT NULL
GROUP BY a.id, a.author_name, a.author_email, c.week_num
ORDER BY c.week_num DESC, a.author_name;

WEEK_NUMBER is a function I wrote for this query:

CREATE OR REPLACE FUNCTION WEEK_NUMBER(date TIMESTAMP) RETURNS INTEGER AS
$$
SELECT TRUNC(DATE_PART('day', date - '2008-01-01') / 7)::INTEGER;
$$ LANGUAGE SQL;

Currently, the query works like a charm with one major caveat. It doesn't properly calculate 0 when the author made no commits in a given week. I'm not sure why it doesn't. When I do the query with just the FROM and CROSS JOIN, it properly prints the many thousand combined authors/weeks. However, when I add the LEFT JOIN, it loses any week where the author did not make a commit.

Any help would be greatly appreciated. I'm open to doing away with the generate_series call if it's unnecessary.

Also, I found this post, but I don't think it's helpful for my case.

Upvotes: 1

Views: 4503

Answers (3)

Gordon Linoff
Gordon Linoff

Reputation: 1270411

Remove the filtering condition. Also a subquery is not needed and you want to select s.n instead of c.week_num:

SELECT a.id, a.author_name, a.author_email, s.n as week_num, COUNT(c.id)
FROM author a CROSS JOIN
     generate_series(1, 610) AS s(n) LEFT JOIN
     commit_metrics c
     ON s.n = WEEK_NUMBER(c.author_date) AND a.id = c.author_id
GROUP BY a.id, a.author_name, a.author_email, c.week_num
ORDER BY c.week_num DESC, a.author_name;

Upvotes: 0

Charles Fox
Charles Fox

Reputation: 281

Although you are using a left join, "WHERE c.week_num IS NOT NULL" filters out all of the cases where there is no post. Try this:

SELECT a.id, a.author_name, a.author_email, s.n as week_num, COUNT(c.id) as post_count
FROM author AS a
     CROSS JOIN generate_series(1, 610) AS s(n)
     LEFT JOIN  (SELECT c.id,
                        c.author_id,
                        c.author_date,
                        WEEK_NUMBER(c.author_date) AS week_num
                 FROM commit_metrics c) AS c ON s.n = c.week_num AND a.id = c.author_id
GROUP BY a.id, a.author_name, a.author_email, s.n
ORDER BY s.n DESC, a.author_name;

Upvotes: 3

Combee Bowlin
Combee Bowlin

Reputation: 116

Your WHERE clause is excluding the records on commit_metrics that are null, which is the case when the author has no commits during the week selected. You should just remove this from the WHERE clause to get your desired output.

If you need the WHERE clause to eliminate some of the CROSS JOIN records based on your data, you will need that CROSS JOIN and WHERE to be in a sub-select that you LEFT JOIN to, or create some more complicated logic in the current WHERE clause.

Upvotes: 1

Related Questions