Reputation: 4360
I'm using Hive, and have a database of strings. I want the rows with certain words, but need word boundaries (if I'm searching for "transport", I don't want a row with "transporter"). How do I do this?
Currently, I'm doing something like the following:
hive> select * from myTable where (length(regexp_extract(column1, 'transport', 0)) > 0);
But obviously this is not incorporating any word boundaries.
Thanks in advance!
Upvotes: 1
Views: 2472
Reputation: 70722
Hive uses java regular expression syntax. You can try the following.
regexp_extract(column1, '\btransport\b')
If it fails, try double escaping your word boundaries \\b
Upvotes: 3
Reputation: 89547
I know that in POSIX compliant regexes, word boundaries are like this: [[:<:]]
and [[:>:]]
:
[[:<:]]transport[[:>:]]
However, I don't know if it is supported by hiveQL.
Upvotes: 1