Reputation: 37
I want to extract a word between '/bla-bla-bla/' and 'a12345' in the URL, which is "this-is-the-word"
using regexp_extract in Hive.
INPUT: www.website.com/bla-bla-bla/this-is-the-word.a12345.anotherword.blabla
DESIRED OUTPUT: this-is-the-word
I've tried below, but none of them worked. What RegEx will achieve my desired output from this input?
regexp_extract(URL,'^.*[/]bla[-]bla[-]bla[/]([a-z]+)\\.(a([0-9]+))*$',1)
regexp_extract(URL,'^.*[/]bla-bla-bla[/]([a-z]*)[.]a([0-9]+)*$',1)
Upvotes: 1
Views: 955
Reputation: 626699
You may use
regexp_extract(URL,'^.*/bla-bla-bla/([^/.]+)\.a[0-9].*$', 1)
See this regex demo
It matches
^
- start of string.*
- any 0+ chars other than line break chars, as many as possible/bla-bla-bla/
- a literal /bla-bla-bla/
substring([^/.]+)
- Group 1 (what you will get since the next argument is 1
): 1 or more chars other than /
and .
\.a
- a .a
substring[0-9]
- a digit.*$
- the rest of the string to its end.Upvotes: 2