lalith kkvn
lalith kkvn

Reputation: 310

Hive collect_list() does not collect NULL values

I am trying to collect a column with NULLs along with some values in that column...But collect_list ignores the NULLs and collects only the ones with values in it. Is there a way to retrieve the NULLs along with other values ?

SELECT col1, col2, collect_list(col3) as col3
FROM (SELECT * FROM table_1 ORDER BY col1, col2, col3)
GROUP BY col1, col2;

Actual col3 values

0.9
NULL
NULL
0.7
0.6 

Resulting col3 values

[0.9, 0.7, 0.6]

I was hoping that there is a hive solution that looks like this [0.9, NULL, NULL, 0.7, 0.6] after applying the collect_list.

Upvotes: 8

Views: 21427

Answers (2)

Zsuzsa
Zsuzsa

Reputation: 427

This function works like this, but I've found the following workaround. Add a case when statement to your query to check and keep NULLs.

SELECT col1, 
    col2, 
    collect_list(CASE WHEN col3 IS NULL THEN 'NULL' ELSE col3 END) as col3
FROM (SELECT * FROM table_1 ORDER BY col1, col2, col3)
GROUP BY col1, col2

Now, because you had a string element ('NULL') the whole result set is an array of strings. At the end just convert the array of strings to an array of double values.

Upvotes: 9

Deb
Deb

Reputation: 471

Note: If your column is STRING it won't be having a NULL value even though your external file does not have any data for that column

you can a where condition with validation check like "col3 is NULL and col3 is not NULL"

Upvotes: 0

Related Questions