MyStackRunnethOver
MyStackRunnethOver

Reputation: 5275

Hive SQL - Test for \u0000 (ascii 00) without `chr()`

I have a dataset with some corrupted data - a string column has some strings containing \u0000. I need to filter out all of them, and the only thing I have at my disposal is the where clause.

I tried WHERE field NOT LIKE concat('%', chr(00), '%'), but my hive distro (which is AWS EMR) doesn't recognize chr(). Is there another option for filling out my where clause to filter out fields containing \u0000, without using chr()?

Upvotes: 4

Views: 651

Answers (2)

Chema
Chema

Reputation: 2838

You could try as follow

SELECT '\u0000' AS text;

+-------+--+
| text  |
+-------+--+
|      |
+-------+--+
-- NOT EMPTY
SELECT '\u0000abc' AS text;

+-------+--+
| text  |
+-------+--+
| abc  |
+-------+--+
-- NOT EMPTY

so

SELECT text 
FROM(SELECT '\u0000abc' AS text) AS t  
WHERE text NOT LIKE('\u0000%');

+-------+--+
| text  |
+-------+--+
+-------+--+
-- EMPTY
SELECT text 
FROM(SELECT '\u0000abc' AS text) AS t  
WHERE text LIKE('\u0000%');

+-------+--+
| text  |
+-------+--+
| abc  |
+-------+--+
-- NOT EMPTY

Upvotes: 1

GoodDok
GoodDok

Reputation: 1850

Try out the following:

WHERE field NOT LIKE '%\000%'

Upvotes: 0

Related Questions