Reputation: 43
Hi I'm trying to parse large URL's from a log using Hive.
There is a particular value that I want to extract from the URL (strategy=??) the values can be hyphenated, but not always.
I built this sample query, but it just returns nothing.
What am I doing wrong?
select regexp_extract('234=23234&werw=asdf&strategy=retargeting&asdf=fds23', '(strategy=)([-\w*]*)',2) from vt;
so the value I'm expecting to get is retargeting from this partial URL string. 234=23234&werw=asdf&strategy=retargeting&asdf=fds23
Any help is greatly appreciated!!!
Upvotes: 0
Views: 4326
Reputation: 411
I believe this regex will work for you:
strategy=((\w-?)+)
Here's a RegExr link: http://regexr.com?35sbl. After matching, group 1 contains the value of strategy
. Note that this regex will match any number of hyphens in the value. It also fails if the hyphen is the first character (though, in my opinion, a leading hyphen does not make a value 'hyphenated').
From what I can tell, your method didn't return anything because of the way group 2 is set up: you have [-\w*]
, which says "match a hyphen and then any number of alphanumeric characters (including 0)". You could also rewrite this as [-?\w*]*
, which says "match or don't match a hyphen, and then any number of alphanumeric characters (including 0)". However, this would then match just a hyphen, as in this case
strategy=-
Which is not something you want, I would think. A slightly safer way might be setting group 2 to [-?\w+]+
, which would require at least one \w
character after the equals sign. Happy coding! :)
Upvotes: 0