Hive SQL nested query use similar column

Question

I have a query that includes two subqueries with similar column 'day'. I would like to show values in a following way:

day cnt1 cnt_total

But in a query I have it does not recognize that the day column is similar and makes a multiplication of all rows in nested statement one by all rows in nested statement two.

Is there a way to make it recognize that the day column is similar?

The query looks as follows:

SELECT p1.day, p1.count AS cnt1, p2.count AS cnt_total
FROM
(
SELECT day, COUNT(DISTINCT id) AS count FROM table
WHERE 1=1
  AND service="service"
  AND action="action"
  AND path LIKE "%search%"
  AND year="2021"
GROUP BY day
) p1,
(
SELECT day, COUNT(DISTINCT id) AS count FROM table
WHERE 1=1
  AND service="service"
  AND action="action"
  AND year="2021"
GROUP BY day
) p2;

Gordon Linoff · Accepted Answer

You should be able to do this with conditional aggregation, so only one SELECT is needed:

SELECT day,
       COUNT(DISTINCT CASE WHEN action = 'mousedown' AND data["path"] LIKE '%go-to-latest-search%' THEN gsid END) AS count,
       COUNT(DISTINCT CASE WHEN action = 'impress' THEN gsid END) as cnt_total
FROM hit
WHERE service = 'sauto' AND
      year = '2021' AND
      month = '07'
GROUP BY day

Hive SQL nested query use similar column

Answers (1)

Related Questions