catbird
catbird

Reputation: 13

Hive SQL Distinct Column Syntax Error when calling multiple columns

After using a WITH clause and series of inner joins, I attempted to call back three columns: Employees, SalesID and a COUNT(DISTINCT) and encountered a Syntax Error.

This is for a hadoop environment with hive sql. With SELECT Employees, SalesID, Workload, all appropriate data recalls; however, a syntax issue arises when performing the COUNT(DISTINCT).

Successful Results:

SELECT SalesID
, COUNT(DISTINCT workload) AS Cases

GROUP BY SalesID 

Also Successful Results:

SELECT EmployeeName
    , SalesID
    , Workload AS Cases

ORDER BY SalesID

Syntax Error:

SELECT  EmployeeName
    , SalesID
    , COUNT(DISTINCT workload) Cases

GROUP BY SalesID

I expect the output to reflect:

EmployeeName   | SalesID | Cases
First Last1    | 12345   |   4
First Last2    | 23455   |   2
First Last3    | 45633   |   7

Upvotes: 1

Views: 473

Answers (1)

Gordon Linoff
Gordon Linoff

Reputation: 1269553

First, none of your queries have a FROM clause, so all should generate syntax errors.

More importantly, the SELECT and GROUP BY lists should match -- all unaggregated columns should be in the GROUP BY:

SELECT  EmployeeName, SalesID, COUNT(DISTINCT workload) as Cases
FROM t
GROUP BY EmployeeName SalesID

Upvotes: 1

Related Questions