Peter
Peter

Reputation: 131

Hive/SQL count occurrences for multiple columns and rows

I'm looking for a smart way to count occurrences.

Here is an example:

 UserID     CityID    CountryID   TagID
 100000      1         30        5
 100001      1         30        6
 100000      2         20        7
 100000      2         40        8
 100001      1         40        6
 100002      1         40        5
 100002      1         20        6

What I wanna do:

I want to count the occurrences of values columnwise and for each user. In the end I want a table which shows me how many user have more than to different characteristics.

The result should look like this - more or less

Different_CityID    Different_CountryIDs   Different_TagIDs
1                   3                      2

Explanation:

I struggled with COUNTs for the columns and GROUP BYs but in the end it doesn't work out. Is there a smart solution?

Thanks a lot

Upvotes: 2

Views: 7364

Answers (3)

overflow
overflow

Reputation: 79

select uid,cid,count(c),count(g) from(select cid,uid,count(coid) over(partition by cid,uid) as c,count(tagid) over(partition by cid,tagid) as g from citydata)e group by cid,uid;

here uid = userid,cid =cityid,coid = countryid,tagid

Total MapReduce CPU Time Spent: 0 msec OK uid cid coid tagid 100000 1 1 1 100001 1 2 2 100002 1 2 2 100000 2 2 2 Time taken: 3.865 seconds, Fetched: 4 row(s)

based on userid i hope this will be helpful

Upvotes: 1

David דודו Markovitz
David דודו Markovitz

Reputation: 44921

select  count(case when pos=0 and count_distinct_ID>1 then 1 end) as different_cityid
       ,count(case when pos=1 and count_distinct_ID>1 then 1 end) as different_countryid
       ,count(case when pos=2 and count_distinct_ID>1 then 1 end) as different_tagid

from   (select      pe.pos
                   ,count (distinct pe.ID) as count_distinct_ID
        from        mytable t
                    lateral view posexplode (array(CityID,CountryID,TagID)) pe as pos,ID

        group by    t.UserID
                   ,pe.pos        
        ) t          
;

+------------------+---------------------+-----------------+
| different_cityid | different_countryid | different_tagid |
+------------------+---------------------+-----------------+
|                1 |                   3 |               2 |
+------------------+---------------------+-----------------+

Here is another variation that avoids the count(distinct ...)

select  count (case when pos=0 and not is_distinct_ID then 1 end)  as different_cityid
       ,count (case when pos=1 and not is_distinct_ID then 1 end)  as different_countryid
       ,count (case when pos=2 and not is_distinct_ID then 1 end)  as different_tagid

from   (select      pe.pos
                   ,min(pe.ID)<=>max(pe.ID)  as is_distinct_ID
        from        mytable t
                    lateral view posexplode (array(CityID,CountryID,TagID)) pe as pos,ID

        group by    t.UserID
                   ,pe.pos        
        ) t          
; 

...and another variation

select  count (case when not is_distinct_CityID    then 1 end)   as different_cityid
       ,count (case when not is_distinct_CountryID then 1 end)   as different_countryid
       ,count (case when not is_distinct_TagID     then 1 end)   as different_tagid

from   (select      min (CityID)    <=> max (CityID)     as is_distinct_CityID
                   ,min (CountryID) <=> max (CountryID)  as is_distinct_CountryID
                   ,min (TagID)     <=> max (TagID)      as is_distinct_TagID

        from        mytable

        group by    UserID     
        ) t          
;

Upvotes: 1

Vinutha N
Vinutha N

Reputation: 166

Use the below code i think it help you,

SELECT COUNT(DISTINCT (CountryID)) AS CountryID,
COUNT(DISTINCT(CityID)) AS CityID,
COUNT(DISTINCT(TagID)) AS TagID
FROM test GROUP BY UserID

result will be like this,

CountryID   CityID  TagID
2   3   3
1   2   1
1   2   2

Regards, Vinu

Upvotes: 1

Related Questions