user3519956
user3519956

Reputation: 33

How can I extract everything in between "_" characters starting with nth occurrence of said character (from the end of the string)?

How can I extract below parameters with Regex using the RE2 syntax (not all features are available)?

What I would need is to extract (different Regex for each):

  1. Parameter that appears after third occurrence of _ from the end of the string and until second occurrence of _ (from the end of the string)
  2. Parameter that appears after second occurrence of _ from the end of the string and until first occurrence of _ (from the end of the string)
  3. Parameter that appears after first occurrence of _ from the end of the string and until first occurrence of . in the string (from the end of the string)
  4. Everything up until first occurrence of . from the end of the string

Lets say that we have below string:

In this case I want to extract:

  1. ParameterA
  2. ParameterB
  3. ParameterC
  4. mp4

Couple of notes:


Best thing I did so far is to extract _ParameterA_ParameterB_ParameterC.mp4 with ((?:_[^_]*){3})$ but that's not what I need.

Also figured out how to pull ".mp4" ( (\..*)$ )but can't figure out how to get it without ..


I figured out how to pull "mp4" with RE2. It's ([^.]*)$.

Upvotes: 3

Views: 214

Answers (2)

The fourth bird
The fourth bird

Reputation: 163467

You can use a single capture group to match the first parameter and always capture the extension at the end of the string.

For the second to n parameters use optional capture groups.

If you don't want to cross newlines, you could change the character class to [^_\r\n]*

_([^_]*)(?:_([^_]*)(?:_([^_]*))?)?\.(\w+)$
  • _([^_]*) Match _ and capture 0+ times any char except _ in group 1
  • (?: Non capture group
    • _([^_]*) Match _ and capture 0+ times any char except _ in group 2
    • (?: Non capture group
      • _([^_]*) Match _ and capture 0+ times any char except _ in group 3
    • )? Close non capture group and make it optional
  • )? Close the whole non capture group and make it optional
  • \.(\w+) Match a dot and 1+ word chars
  • $ End of string

Regex demo

Upvotes: 2

dcsuka
dcsuka

Reputation: 2997

Here are four suitable regular expressions utilizing positive lookarounds. Let me know if they work:

"(?<=\_)[^_]+(?=_[^_]+_[^_]+\.)"
"(?<=\_)[^_]+(?=_[^_]+\.)"
"(?<=\_)[^_]+(?=\.)"
"(?<=\.).*$"

As Google Data Studio cannot implement lookaround, here is an alternative workaround with multiple steps, which is written in R but can be translated to your language of choice:

text1 <- "Parameter1_Parameter3_Parameter4_ParamaterA_ParameterB_ParamaterC.mp4"

last_three <- str_extract(text1, "[^_]+_[^_]+_[^_]+\\..+")

str_extract(last_three, "^[^_]+")

str_replace_all(str_extract(last_three, "_[^_\\.]+_"), "_", "")

str_replace(str_extract(last_three, "[^_\\.]+\\."), "\\.", "")

str_replace(str_extract(last_three, "\\..+$"), "\\.", "")

https://support.google.com/datastudio/table/6379764?hl=en

Google Data Studio has the required commands for this: REGEXP_EXTRACT and REGEXP_REPLACE.

Upvotes: 1

Related Questions