Reputation: 993
I have both a specific question and a more general question... for context, I'm using Athena. Here's my specific question:
1) I'm trying to write a query where I calculate the DAUs for 2 successive months broken down by country (I eventually want the DAUs/MAUs broken down by country and day but thought I'd start simple...). My thinking on how to approach this was I'd need two queries, one for each month, that used a group by on country and applied the necessary filters. Finally, I'd be able to join on country to get 1 column that contained countries, and the following columns containing the two months of DAUS. Here's the query I rustled up:
SELECT count(distinct uuid) m1, country
FROM user u1
WHERE month(date(dt)) = 12
AND YEAR(date(dt)) = 2018
INNER JOIN (SELECT count(distinct uuid) m2, country
FROM user
WHERE month(date(dt)) = 11
AND YEAR(date(dt)) = 2018 GROUP BY country
) t ON t.country = u1.country
GROUP BY country
What's the issue with this? Error: mismatched input 'join' expecting {<eof>
2) My more general question is, how should I think when constructing more complex queries? I've programmed for years yet I seem to always find myself getting stuck with complicated SQL queries and hitting error message after error message (which also seem almost uninterpretable).
Thanks for all advice :)
Upvotes: 0
Views: 4111
Reputation: 60502
You want to join the result of two Selects:
SELECT t2.*, t.m2
FROM
(
SELECT count(distinct uuid) m1, country
FROM user u1
WHERE month(date(dt)) = 12
AND YEAR(date(dt)) = 2018
GROUP BY country
) as t2
JOIN
(
SELECT count(distinct uuid) m2, country
FROM user
WHERE month(date(dt)) = 11
AND YEAR(date(dt)) = 2018
GROUP BY country
) as t
ON t.country = t2.country
But this will remove countries which don't have data for both months and then you have to switch to a FULL OUTER JOIN.
In your case it's much easier to use conditional aggregation:
SELECT country
,count(distinct case when month(date(dt)) = 12 AND YEAR(date(dt)) = 2018 then uuid end) as m1
,count(distinct case when month(date(dt)) = 11 AND YEAR(date(dt)) = 2018 then uuid end) as m2
FROM user
GROUP BY country
Upvotes: 0
Reputation: 133400
The where and group by should be place after the the where join clause
SELECT count(distinct uuid) m1, country
FROM user u1
INNER JOIN (SELECT count(distinct uuid) m2, country
FROM user
WHERE month(date(dt)) = 11
AND YEAR(date(dt)) = 2018 GROUP BY country
) t ON t.country = u1.country
WHERE month(date(dt)) = 12
AND YEAR(date(dt)) = 2018
GROUP BY country
Upvotes: 1