gnarlybracket
gnarlybracket

Reputation: 1716

Match first of two conditions

My problem is simple, but I've been pulling my hair out trying to solve it. I have two types of strings: one has a semicolon and the other doesn't. Both have colons.

Reason: A chosen reason
Delete: Other: testing
Reason for action: Other; testing
Blah: Other; testing;testing

If the string has a semicolon, I want to match anything after the first one. If it has no semicolon, I want to match everything after the first colon. For lines above I should get:

A chosen reason
Other: testing
testing
testing;testing

I can get the semicolon to match by using ;(.*) and I can get the colon to match by using :(.*).

I tried using an alternative like this: ;(.*)|:(.*) thinking that maybe if I have the right order I can get it to match the semicolon first, and then the colon if there is no semicolon, but it always just matched the colon.

What am I doing wrong?

Edit

I added another test case above to match the requirements I had stated. For strings with no semicolon, it should match the first colon.

Also, "Reason" could be anything, so I am clarifying that as well in the test cases.

Second Edit

To clarify, I'm using the POSIX Regular Expressions (using in PostgeSQL).

Upvotes: 2

Views: 156

Answers (4)

The fourth bird
The fourth bird

Reputation: 163632

One option is to use an alternation to first check if the string has no ; If there is none, then match until the first : and capture the rest in group 1.

In the case that there a ; match until the first semicolon and capture the rest in group 1.

For the logic stated in the question:

  • If the string has a semicolon, I want to match anything after the first one.
  • If it has no semicolon, I want to match everything after the first colon

You could use:

^(?:(?!.*;)[^\r\n:]*:|[^;\r\n]*;)[ \t]*(.*)$

Explanation

  • ^ Start of string
  • (?: Non capturing group
    • (?!.*;) Negative lookahead (supported by Postgresql), assert string does not contain ;
    • [^\r\n:]*: If that is the case, match 0+ times not : or a newline, then match :
    • | Or
    • [^;\r\n]*; Match 0+ times not ; or newline, then match ;
  • ) Close non capturing group
  • [ \t]* Match 0+ spaces or tabs
  • (.*) Capturing group 1, match any char 0+ times
  • $ End of string

Regex demo | Postgresql demo

Upvotes: 1

Emma
Emma

Reputation: 27743

My guess is that you might want to design an expression, maybe similar to:

:\s*(?:[^;\r\n]*;)?\s*(.*)$

Demo

Upvotes: 4

palvarez
palvarez

Reputation: 1598

Here you have a fast regex (233 steps) with no look aheads.

.*?:\s*(?:([^\n;]+)|.*?;\s*(.*))$

Check out the regex https://regex101.com/r/9gbpjW/3

UPDATED: to match any placeholder. Instead of just Reason

Upvotes: 1

dhanlin
dhanlin

Reputation: 145

regex = .*?:(?(?!.*;)(.*)|.*?;(.*))

demo

Upvotes: 0

Related Questions