Converting Sub query to a single query Hive

Question

I have a query which needs a count of colA groupbed by colB witha particular average of COlC value. for example

SELECT COUNT( X.colA ), X.colB , X.MEASURE
FROM (
  SELECT colA  , colB  , avg(colC) MEASURE
  FROM tableA
  GROUP BY colA, colB
  HAVING round(avg(colC),2) > 0
) X 
GROUP BY X.MEASURE , X.colB
HAVING X.MEASURE BETWEEN 0 AND 3000
ORDER BY MEASURE

Example result could be

No of User, URL    , average time spent
90182     , abc.com,    334
293556    , def.com,     33

Problem with above query is that since it has a sub query the inner sub query shuffles a huge amount of data as a intermediate result to outer query which results in query becoming very slow on large data sets.

Is there a way we can convert above query to a query without any sub -query or is there any UDAF available so there is no more major shuffle of intermediate data and it runs in a single stage ?

Converting Sub query to a single query Hive

Answers (1)

Related Questions