Reputation: 177
I have a scenario for which I wanted to write a Hive query. The scenario is here:
The data is like below:
Now, the output I am expecting is:
Null/empty values to be ignored.
Essentially, I want the sum of values in columns col1, col2, col3 and col4 and grouped the values. The 'id' column is irrelevant here, but provided for illustration.
Any helping in writing a HiveQL query for this scenario would help.
Thanks.
Upvotes: 0
Views: 643
Reputation: 1
Another way to accomplish this in HIVE
select e.single_col,count(e.single_col) from (
select stack(4,col1,col2,col3,col4) as single_col from challenge ) e
where e.single_col != ''
group by e.single_col;
challenge is the table used here. The data is loaded into the table as a text format delimited with commas
1,A,B,,D
2,B,C,D,
3,,D,C,
4,,A,A,
Upvotes: 0
Reputation: 2223
select val, count(1)
from test_table lateral view explode(array(col1,col2,col3,col4)) t as val
where val is not null
group by val
explain:
test_table
is your table name.
If you have more columns than 4, put them in the array
function.
First I collect all of your columns into an array.And then use the lateral view explode(array)
to make the array items in one column.At last group by this column and make the count.
Upvotes: 1