Luis Howe
Luis Howe

Reputation: 9

How can I extract the numerical prefix of a string using REGEX_EXTRACT on Hive?

I'm not sure how to write my regex command on Hive to pull the numerical prefix substring from this string: 211118-1_20569 - (DHCP). I need to return 211118, but also have the flexibility to return digits with smaller or larger values depending on the size of the numerical prefix.

Upvotes: 0

Views: 231

Answers (1)

David דודו Markovitz
David דודו Markovitz

Reputation: 44951

hive> select regexp_extract('211118-1_20569 - (DHCP)','^\\d+',0);
OK
211118

or

hive> select regexp_extract('211118-1_20569 - (DHCP)','^[0-9]+',0);
OK
211118

^     - The beginning of a line
\d    - A digit: [0-9]
[0-9] - the characters between '0' and '9'
X+    - X, one or more times

https://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html


regexp_extract(string subject, string pattern, int index)
  • predefined character classes (e.g. \d) should be preceded with additional backslash (\\d)
  • index = 0 matches the whole pattern

https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-StringOperators

Upvotes: 1

Related Questions