user3802527
user3802527

Reputation: 1

HIVE JOIN query error

I'm new to using HIVE and have been having some troubles getting one of my queries to work properly. All the variables used below are defined in our tables and I'm trying to isolate the data in the delivery_view table that have uids which exist in the attribution table with the limitations stated in the WHERE statement.

The current error I'm dealing with is that HIVE thinks the uid in the GROUP BY command is a function. Previously, I received an error stating that an EOF was necessary near the LEFT JOIN statement however, it appears as though that was corrected, although I'm not entirely sure how.

Any help would be much appreciated!

SELECT COUNT(DISTINCT attribution_view.uid)
FROM attribution_view
LEFT OUTER JOIN delivery_view ON (attribution_view.uid = delivery_view.uid)
WHERE attribution_view.conversion_agency_id = 19 AND
    attribution_view.conversion_period BETWEEN '2014-06-01 12:00:00' AND '2014-06-01 13:00:00'
GROUP BY COUNT(DISTINCT attribution_view.uid)

Upvotes: 0

Views: 416

Answers (1)

Gordon Linoff
Gordon Linoff

Reputation: 1269463

You can't group by an aggregation function. If you just want the total number of different uids, then remove that clause:

SELECT COUNT(DISTINCT attribution_view.uid)
FROM attribution_view
LEFT OUTER JOIN delivery_view ON (attribution_view.uid = delivery_view.uid)
WHERE attribution_view.conversion_agency_id = 19 AND
    attribution_view.conversion_period BETWEEN '2014-06-01 12:00:00' AND '2014-06-01 13:00:00';

However, if you just want matching rows, then use a regular join. I'm think of a query more like this:

SELECT COUNT(DISTINCT av.uid)
FROM attribution_view av INNER JOIN
     delivery_view dv
     ON av.uid = dv.uid
WHERE av.conversion_agency_id = 19 AND
      av.conversion_period BETWEEN '2014-06-01 12:00:00' AND '2014-06-01 13:00:00';

Also note the use a table aliases. These make the query easier to write and read.

EDIT:

Does this do what you want?

select cnt, count(*), min(uid), max(uid)
from (SELECT av.uid, count(*) as cnt
      FROM attribution_view av INNER JOIN
           delivery_view dv
           ON av.uid = dv.uid
      WHERE av.conversion_agency_id = 19 AND
            av.conversion_period BETWEEN '2014-06-01 12:00:00' AND '2014-06-01 13:00:00'
      GROUP BY av.uid
     ) u
group by cnt
order by 1;

Upvotes: 2

Related Questions