Reputation: 1
I'm new to using HIVE and have been having some troubles getting one of my queries to work properly. All the variables used below are defined in our tables and I'm trying to isolate the data in the delivery_view table that have uids which exist in the attribution table with the limitations stated in the WHERE statement.
The current error I'm dealing with is that HIVE thinks the uid in the GROUP BY command is a function. Previously, I received an error stating that an EOF was necessary near the LEFT JOIN statement however, it appears as though that was corrected, although I'm not entirely sure how.
Any help would be much appreciated!
SELECT COUNT(DISTINCT attribution_view.uid)
FROM attribution_view
LEFT OUTER JOIN delivery_view ON (attribution_view.uid = delivery_view.uid)
WHERE attribution_view.conversion_agency_id = 19 AND
attribution_view.conversion_period BETWEEN '2014-06-01 12:00:00' AND '2014-06-01 13:00:00'
GROUP BY COUNT(DISTINCT attribution_view.uid)
Upvotes: 0
Views: 416
Reputation: 1269463
You can't group by an aggregation function. If you just want the total number of different uid
s, then remove that clause:
SELECT COUNT(DISTINCT attribution_view.uid)
FROM attribution_view
LEFT OUTER JOIN delivery_view ON (attribution_view.uid = delivery_view.uid)
WHERE attribution_view.conversion_agency_id = 19 AND
attribution_view.conversion_period BETWEEN '2014-06-01 12:00:00' AND '2014-06-01 13:00:00';
However, if you just want matching rows, then use a regular join. I'm think of a query more like this:
SELECT COUNT(DISTINCT av.uid)
FROM attribution_view av INNER JOIN
delivery_view dv
ON av.uid = dv.uid
WHERE av.conversion_agency_id = 19 AND
av.conversion_period BETWEEN '2014-06-01 12:00:00' AND '2014-06-01 13:00:00';
Also note the use a table aliases. These make the query easier to write and read.
EDIT:
Does this do what you want?
select cnt, count(*), min(uid), max(uid)
from (SELECT av.uid, count(*) as cnt
FROM attribution_view av INNER JOIN
delivery_view dv
ON av.uid = dv.uid
WHERE av.conversion_agency_id = 19 AND
av.conversion_period BETWEEN '2014-06-01 12:00:00' AND '2014-06-01 13:00:00'
GROUP BY av.uid
) u
group by cnt
order by 1;
Upvotes: 2