Reputation: 33
How can I extract below parameters with Regex using the RE2 syntax (not all features are available)?
What I would need is to extract (different Regex for each):
_
from the end of the string and until second occurrence of _
(from the end of the string)_
from the end of the string and until first occurrence of _
(from the end of the string)_
from the end of
the string and until first occurrence of .
in the string (from the end of the string).
from the end of the stringLets say that we have below string:
In this case I want to extract:
Couple of notes:
Best thing I did so far is to extract _ParameterA_ParameterB_ParameterC.mp4 with ((?:_[^_]*){3})$
but that's not what I need.
Also figured out how to pull ".mp4" ( (\..*)$
)but can't figure out how to get it without .
.
I figured out how to pull "mp4" with RE2. It's ([^.]*)$
.
Upvotes: 3
Views: 214
Reputation: 163467
You can use a single capture group to match the first parameter and always capture the extension at the end of the string.
For the second to n parameters use optional capture groups.
If you don't want to cross newlines, you could change the character class to [^_\r\n]*
_([^_]*)(?:_([^_]*)(?:_([^_]*))?)?\.(\w+)$
_([^_]*)
Match _
and capture 0+ times any char except _
in group 1(?:
Non capture group
_([^_]*)
Match _
and capture 0+ times any char except _
in group 2(?:
Non capture group
_([^_]*)
Match _
and capture 0+ times any char except _
in group 3)?
Close non capture group and make it optional)?
Close the whole non capture group and make it optional\.(\w+)
Match a dot and 1+ word chars$
End of stringUpvotes: 2
Reputation: 2997
Here are four suitable regular expressions utilizing positive lookarounds. Let me know if they work:
"(?<=\_)[^_]+(?=_[^_]+_[^_]+\.)"
"(?<=\_)[^_]+(?=_[^_]+\.)"
"(?<=\_)[^_]+(?=\.)"
"(?<=\.).*$"
As Google Data Studio cannot implement lookaround, here is an alternative workaround with multiple steps, which is written in R but can be translated to your language of choice:
text1 <- "Parameter1_Parameter3_Parameter4_ParamaterA_ParameterB_ParamaterC.mp4"
last_three <- str_extract(text1, "[^_]+_[^_]+_[^_]+\\..+")
str_extract(last_three, "^[^_]+")
str_replace_all(str_extract(last_three, "_[^_\\.]+_"), "_", "")
str_replace(str_extract(last_three, "[^_\\.]+\\."), "\\.", "")
str_replace(str_extract(last_three, "\\..+$"), "\\.", "")
https://support.google.com/datastudio/table/6379764?hl=en
Google Data Studio has the required commands for this: REGEXP_EXTRACT and REGEXP_REPLACE.
Upvotes: 1