Reputation: 680
I have a table that looks like:
bl.ah
foo.bar
bar.fight
And I'd like to use HiveQL's regexp_extract
to return
bl
foo
bar
Upvotes: 4
Views: 8604
Reputation: 931
Given the docs data about regexp_extract:
regexp_extract(string subject, string pattern, int index)
Returns the string extracted using the pattern. For example, regexp_extract('foothebar', 'foo(.*?)(bar)', 2) returns 'bar.' Note that some care is necessary in using predefined character classes: using '\s' as the second argument will match the letter s; '\s' is necessary to match whitespace, etc. The 'index' parameter is the Java regex Matcher group() method index. See docs/api/java/util/regex/Matcher.html for more information on the 'index' or Java regex group() method.
So, if you have a table with a single column (let's call it description
for our example) you should be able to use regexp_extract as follows to get the data before a period, if one exists, or the entire string in the absence of a period:
regexp_extract(description,'^([^\.]+)\.?',1)
The components of the regex are as follows:
^
start of string([^\.]+)
any non-period character one or more times, in a capture group\.?
a period either once or no timesBecause the part of the string we're interested in will be in the first (and only) capture group, we refer to it by passing the index
parameter a value of 1.
Upvotes: 5