user3031097
user3031097

Reputation: 177

Hive query for a specific scenario

I have a scenario for which I wanted to write a Hive query. The scenario is here:

The data is like below:

enter image description here

Now, the output I am expecting is:

enter image description here

Null/empty values to be ignored.

Essentially, I want the sum of values in columns col1, col2, col3 and col4 and grouped the values. The 'id' column is irrelevant here, but provided for illustration.

Any helping in writing a HiveQL query for this scenario would help.

Thanks.

Upvotes: 0

Views: 643

Answers (2)

Gokul
Gokul

Reputation: 1

Another way to accomplish this in HIVE

select e.single_col,count(e.single_col) from (
select stack(4,col1,col2,col3,col4) as single_col from challenge ) e
where e.single_col != ''
group by e.single_col;

challenge is the table used here. The data is loaded into the table as a text format delimited with commas

1,A,B,,D
2,B,C,D,
3,,D,C,
4,,A,A,

Upvotes: 0

amow
amow

Reputation: 2223

select val, count(1)
from test_table lateral view explode(array(col1,col2,col3,col4)) t as val
where val is not null
group by val

explain:

test_table is your table name.
If you have more columns than 4, put them in the array function.

First I collect all of your columns into an array.And then use the lateral view explode(array) to make the array items in one column.At last group by this column and make the count.

Upvotes: 1

Related Questions