Babbi
Babbi

Reputation: 65

REGEXP_EXTRACT in hive to get the substring of a string

Hi i am new to hive i am using regexp_extract for getting substring from a string

my string is '/abc/def/ghi/'

how will get abc or def or ghi using regexp_extract function

Upvotes: 2

Views: 8532

Answers (3)

Vijiy
Vijiy

Reputation: 1197

Below would be the query.

   hive> select regexp_extract('/abc123./def456./ghi789/', '\/([\\w\\d.]*)\/([\\w\\d.]*)\/([\\w\\d.]*)',1);
OK
abc123.
Time taken: 0.103 seconds, Fetched: 1 row(s)
hive> select regexp_extract('/abc123./def456./ghi789/', '\/([\\w\\d.]*)\/([\\w\\d.]*)\/([\\w\\d.]*)',2);
OK
def456.
Time taken: 0.1 seconds, Fetched: 1 row(s)
hive> select regexp_extract('/abc123./def456./ghi789/', '\/([\\w\\d.]*)\/([\\w\\d.]*)\/([\\w\\d.]*)',3);
OK
ghi789
Time taken: 0.124 seconds, Fetched: 1 row(s)

Upvotes: 1

leftjoin
leftjoin

Reputation: 38290

Remove leading and trailing '/' and use split() to get an Array. split() is also using regexp:

hive> select split(regexp_replace('/abc/def/ghi/','^/|/$',''),'/')[0];

abc

hive> select split(regexp_replace('/abc/def/ghi/','^/|/$',''),'/')[1];

def

hive> select split(regexp_replace('/abc/def/ghi/','^/|/$',''),'/')[2];

ghi

Or in a subquery:

hive> select array[0], array[1], array[2] 
      from (select split(regexp_replace('/abc/def/ghi/','^/|/$',''),'/') as array) s;
OK
_c0     _c1     _c2
abc     def     ghi
Time taken: 0.192 seconds, Fetched: 1 row(s)

Upvotes: 3

Tim Biegeleisen
Tim Biegeleisen

Reputation: 520908

We can use regexp_extract by providing a pattern with capture groups targeting what we want to match. Then, we can specify which group should serve as the replacement.

As an example, to find the content between the second and third path separators, we can try:

regexp_extract('/abc/def/ghi/', '/[^/]+/([^/]+).*', 1)

Note: The above is untested and may give error should it be necessary to escape the forward slashes. In that case, use the following:

regexp_extract('/abc/def/ghi/', '\/[^\/]+\/([^\/]+).*', 1)

Upvotes: 1

Related Questions