Joontae Kim
Joontae Kim

Reputation: 13

Regular expression to remove unnecessary string at last

Assume that we have:

  1. ABC_ANY_STRING_DEF
  2. ANY_STRING
  3. ANY_STRING_DEF
  4. ABC_CDE_ANY_STRING_DEF

"ABC_" or "CDE_" can be prefix or absent. In addition, "_DEF" can be postfix or absent.

In this case, can I extract ANY_STRING (which is just any set of characters, just a string) between prefix and postfix by using one regular expression?

For example, input = "ABC_CDE_I like an apple_DEF", then output must be "I like an apple".

I tried the following code, but it does not output what I expected.

re.compile("(?:ABC_|CDE_)*(\S+)(?:_DEF)?")

or

re.compile("(?:ABC_|CDE_)*(\S+)(?:_DEF)*")

Thanks a lot in advance for your advice.

Upvotes: 1

Views: 146

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627600

You may use

(?:ABC_|CDE_|^)+(\S*?)(?:_DEF|$)

See the regex demo

Details

  • (?: - start of a non-capturing group that matches any of the subpatterns separated with the alternation operator |:
    • ABC_ - a literal substring ABC_
    • | - or
    • CDE_ - a literal substring CDE_
    • | - or
    • ^ - start of string
  • )+ - one or more consecutive occurrences, as many as possible (+ is a greedy quantifier)
  • (\S*?) - Capturing group 1: zero or more chars other than whitespace, but as few as possible due to the *? lazy quantifier
  • (?:_DEF|$) - either _DEF or (|) end of string ($).

Upvotes: 2

Related Questions