Reputation: 105
Is there a way to count a number of columns which has a particular value for each rows in Hive.
I have data which looks like in input and I want to count how many columns have value 'a' and how many column have value 'b' and get the output like in 'Output'.
Is there a way to accomplish this with Hive query?
Upvotes: 0
Views: 123
Reputation: 49270
Use lateral view
with explode
on the data and do the aggregations on it.
select id
,sum(cast(col='a' as int)) as cnt_a
,sum(cast(col='b' as int)) as cnt_b
,sum(cast(col in ('a','b') as int)) as cnt_total
from tbl
lateral view explode(array(ci_1,ci_2,ci_3,ci_4,ci_5)) tbl as col
group by id
Upvotes: 2
Reputation: 1271151
One method in Hive is:
select ( (case when cl_1 = 'a' then 1 else 0 end) +
(case when cl_2 = 'a' then 1 else 0 end) +
(case when cl_3 = 'a' then 1 else 0 end) +
(case when cl_4 = 'a' then 1 else 0 end) +
(case when cl_5 = 'a' then 1 else 0 end)
) as count_a,
( (case when cl_1 = 'b' then 1 else 0 end) +
(case when cl_2 = 'b' then 1 else 0 end) +
(case when cl_3 = 'b' then 1 else 0 end) +
(case when cl_4 = 'b' then 1 else 0 end) +
(case when cl_5 = 'b' then 1 else 0 end)
) as count_b
from t;
To get the total count, I would suggest using a subquery and adding count_a
and count_b
.
Upvotes: 2