R.Gold
R.Gold

Reputation: 95

Distinct count and group by in HIVE

I am very new to HIVE and have an issue with distinct count and GROUP BY.
I want to calculate maximum temperature from temperature_data table corresponding to those years which have at least 2 entries in the table- I tried with below query but it is not working

select 
SUBSTRING(full_date,7,4) as year, 
MAX(temperature) as temperature 
from temperature_data 
where count(distinct(SUBSTRING(full_date,7,4))) >= 2 
GROUP BY SUBSTRING(full_date,7,4);

I am getting an error-

FAILED: SemanticException [Error 10128]: Line 2:0 Not yet supported place for UDAF 'count'

Below is input-

year,zip,temperature
10-01-1990,123112,10
14-02-1991,283901,11
10-03-1990,381920,15
10-01-1991,302918,22
12-02-1990,384902,9
10-01-1991,123112,11
14-02-1990,283901,12
10-03-1991,381920,16
10-01-1990,302918,23
12-02-1991,384902,10
10-01-1993,123112,11

Upvotes: 1

Views: 4420

Answers (2)

MuraliSunil
MuraliSunil

Reputation: 156

@R.Gold, We can try to simplify the above query without using sub-query as below:

SELECT substring(full_date,7) as year, max(temperature)
FROM your-hive-table
GROUP BY substring(full_date,7)
HAVING COUNT(substring(full_date,7)) >= 2

And, fyi - we can't use aggregate functions with WHERE clause.

Upvotes: 0

vvg
vvg

Reputation: 6385

You should use HAVING keyword instead to set a condition on variable you're using for grouping.

Also, you can benefit of using subqueries. See below.

SELECT 
  year, 
  MAX(t1.temperature) as temperature 
FROM
  (select SUBSTRING(full_date,7,4) year, temperature  from temperature_data) t1 
GROUP BY 
  year
HAVING 
  count(t1.year) > 2;

Upvotes: 3

Related Questions