maia
maia

Reputation: 4360

Hive searching for substring with word boundaries

I'm using Hive, and have a database of strings. I want the rows with certain words, but need word boundaries (if I'm searching for "transport", I don't want a row with "transporter"). How do I do this?

Currently, I'm doing something like the following:

hive> select * from myTable where (length(regexp_extract(column1, 'transport', 0)) > 0);

But obviously this is not incorporating any word boundaries.

Thanks in advance!

Upvotes: 1

Views: 2472

Answers (2)

hwnd
hwnd

Reputation: 70722

Hive uses regular expression syntax. You can try the following.

regexp_extract(column1, '\btransport\b')

If it fails, try double escaping your word boundaries \\b

Upvotes: 3

Casimir et Hippolyte
Casimir et Hippolyte

Reputation: 89547

I know that in POSIX compliant regexes, word boundaries are like this: [[:<:]] and [[:>:]]:

[[:<:]]transport[[:>:]]

However, I don't know if it is supported by hiveQL.

Upvotes: 1

Related Questions