Reputation: 3303
I am trying to find the rows from a hive table where a particular column does not contain null values or \N values or STX character '\002'. The objective is to find which rows contain some characters other than these three.
I tried this hive query:
select column1,length(regexp_replace(column1,'\N|\002|NULL','')) as value
FROM table1 LIMIT 10;
I was expecting zero in the following cases but I am getting the following:
column1 value
NULL NULL
0
NULL NULL
0
\N\N\N\N\N\N\N\N 8
NULL NULL
\N\N\N\N\N\N\N\N 8
NULL NULL
NULL NULL
\N\N\N 3
Could someone please help me on the correct regex for the above case?
Thank you.
Ravi
Upvotes: 3
Views: 2388
Reputation: 19423
It looks that hive is using Java's regular expression engine so the problem seems to be with the regex itself, more specifically in the escape sequences.
Try the following and if it doesn't work then please let me know:
(?:(?:\\\\N)+|\002|NULL)
Upvotes: 1