Reputation: 95
I am very new to HIVE and have an issue with distinct count and GROUP BY.
I want to calculate maximum temperature from temperature_data table corresponding to those years which have at least 2 entries in the table-
I tried with below query but it is not working
select
SUBSTRING(full_date,7,4) as year,
MAX(temperature) as temperature
from temperature_data
where count(distinct(SUBSTRING(full_date,7,4))) >= 2
GROUP BY SUBSTRING(full_date,7,4);
I am getting an error-
FAILED: SemanticException [Error 10128]: Line 2:0 Not yet supported place for UDAF 'count'
Below is input-
year,zip,temperature
10-01-1990,123112,10
14-02-1991,283901,11
10-03-1990,381920,15
10-01-1991,302918,22
12-02-1990,384902,9
10-01-1991,123112,11
14-02-1990,283901,12
10-03-1991,381920,16
10-01-1990,302918,23
12-02-1991,384902,10
10-01-1993,123112,11
Upvotes: 1
Views: 4420
Reputation: 156
@R.Gold, We can try to simplify the above query without using sub-query as below:
SELECT substring(full_date,7) as year, max(temperature)
FROM your-hive-table
GROUP BY substring(full_date,7)
HAVING COUNT(substring(full_date,7)) >= 2
And, fyi - we can't use aggregate functions with WHERE clause.
Upvotes: 0
Reputation: 6385
You should use HAVING keyword instead to set a condition on variable you're using for grouping.
Also, you can benefit of using subqueries. See below.
SELECT
year,
MAX(t1.temperature) as temperature
FROM
(select SUBSTRING(full_date,7,4) year, temperature from temperature_data) t1
GROUP BY
year
HAVING
count(t1.year) > 2;
Upvotes: 3