mon
mon

Reputation: 11

Aggregate column based on comparison with array in Hive

Assume I have a hive table that looks like this:

|ID |CODE           |AMT  |NEW AMT|
|---|---------------|-----|-------|
|1  |['a','b',,,]   |10   |   50  |
|2  |[,,,'a','b']   |20   |   70  |
|3  |[,'c','d','e',]|30   |   20  |
|4  |['p','q',,,]   |40   |   20  |

The code column is of an array datatype. It can have 5 values and these values are being populated by an ETLjob. These values are comma separated. I need to find the aggregated value of AMT column keeping the following conditions in place:

  1. if the code has values 'a', 'b' then the value in amount for that id should be zero.
  2. if the code has values 'c','d','e' then the value in amount should be replaced with the value that is in new amt.
  3. if it doesn't match either of the above conditions, the value should be same as that in amt. After this, the sum of amt can be taken. So with the table given above, the sum(amt) should be 60.

I have been struggling with this as I am new to hql/sql. I have tried summing up using a case statement but failed. Thank you for any input you may have!

Upvotes: 1

Views: 304

Answers (2)

leftjoin
leftjoin

Reputation: 38290

"The code column is of an array datatype."

Use array_contains() function with case expressions:

select t.id, t.code, 
       case when array_contains(t.code, 'a') and array_contains(t.code, 'b')  then 0
            when array_contains(t.code, 'c') and array_contains(t.code, 'd') and array_contains(t.code, 'e') then t.new_amt
           else t.amt
       end AMT
  from table_name t

Upvotes: 1

Indra Kumar S
Indra Kumar S

Reputation: 2934

Just use if else or case when

Lets create a table with the sample data you provided

CREATE TABLE `table1` (
 `id` int(11) NOT NULL AUTO_INCREMENT,
 `code` longtext NOT NULL,
 `amt` int(11) NOT NULL,
 `new_amt` int(11) NOT NULL,
 PRIMARY KEY (`id`)
) ENGINE=InnoDB   DEFAULT CHARSET=latin1;


INSERT INTO `table1` (`id`, `code`, `amt`, `new_amt`) VALUES
(1, '[\'a\',\'b\',,,]', 10, 50),
(2, '[,,,\'a\',\'b\']', 20, 70),
(3, '[,\'c\',\'d\',\'e\',]', 30, 20),
(4, '[\'p\',\'q\',,,]', 40, 20);

See how the table looks like SELECT * FROM table1

id code amt new_amt
1 ['a','b',,,] 10 50
2 [,,,'a','b'] 20 70
3 [,'c','d','e',] 30 20
4 ['p','q',,,] 40 20

Now use if else to decide the value

SELECT
    `code`,
    IF(
        `code` LIKE "%a','b%",
        0,
        IF(
            `code` LIKE "%c','d','e%",
            `new_amt`,
            `amt` + `new_amt`
        )
    ) AS price
FROM
    `table1`

Result :

id code price
1 ['a','b',,,] 0
2 [,,,'a','b'] 0
3 [,'c','d','e',] 20
4 ['p','q',,,] 60

Upvotes: 0

Related Questions