Reputation: 65
Hi i am new to hive i am using regexp_extract for getting substring from a string
my string is '/abc/def/ghi/'
how will get abc or def or ghi using regexp_extract function
Upvotes: 2
Views: 8532
Reputation: 1197
Below would be the query.
hive> select regexp_extract('/abc123./def456./ghi789/', '\/([\\w\\d.]*)\/([\\w\\d.]*)\/([\\w\\d.]*)',1);
OK
abc123.
Time taken: 0.103 seconds, Fetched: 1 row(s)
hive> select regexp_extract('/abc123./def456./ghi789/', '\/([\\w\\d.]*)\/([\\w\\d.]*)\/([\\w\\d.]*)',2);
OK
def456.
Time taken: 0.1 seconds, Fetched: 1 row(s)
hive> select regexp_extract('/abc123./def456./ghi789/', '\/([\\w\\d.]*)\/([\\w\\d.]*)\/([\\w\\d.]*)',3);
OK
ghi789
Time taken: 0.124 seconds, Fetched: 1 row(s)
Upvotes: 1
Reputation: 38290
Remove leading and trailing '/'
and use split()
to get an Array. split() is also using regexp:
hive> select split(regexp_replace('/abc/def/ghi/','^/|/$',''),'/')[0];
abc
hive> select split(regexp_replace('/abc/def/ghi/','^/|/$',''),'/')[1];
def
hive> select split(regexp_replace('/abc/def/ghi/','^/|/$',''),'/')[2];
ghi
Or in a subquery:
hive> select array[0], array[1], array[2]
from (select split(regexp_replace('/abc/def/ghi/','^/|/$',''),'/') as array) s;
OK
_c0 _c1 _c2
abc def ghi
Time taken: 0.192 seconds, Fetched: 1 row(s)
Upvotes: 3
Reputation: 520908
We can use regexp_extract
by providing a pattern with capture groups targeting what we want to match. Then, we can specify which group should serve as the replacement.
As an example, to find the content between the second and third path separators, we can try:
regexp_extract('/abc/def/ghi/', '/[^/]+/([^/]+).*', 1)
Note: The above is untested and may give error should it be necessary to escape the forward slashes. In that case, use the following:
regexp_extract('/abc/def/ghi/', '\/[^\/]+\/([^\/]+).*', 1)
Upvotes: 1