user12246099
user12246099

Reputation:

Why does my regex has a strange behavior? Is this a bug?

I have some problems with my regex. I would like to retrieve the string before a comment. This string can be surrounded by quotation marks or not.

If I put a # (a comment) in quotes, I want it to be a string and not a comment.

Here is my regular expression:

[\"']?(.*?)[\"']?\s*(#.*)

Here are some functional examples with this regex:

"test" # comment    ---> group1: test   group2: # comment
test # comment      ---> group1: test   group2: # comment

Here's what I'm having trouble with and I do not understand:

"t#est" # comment   ---> group1: t      group2: #est" # comment

I want group1: t#est group2: # comment

My regex with Regex1O1

Thank you in advance for your help.

Upvotes: 1

Views: 57

Answers (2)

orlp
orlp

Reputation: 117681

You made the opening/closing quotes optional, but aren't consistent. They should either both be there or neither should be. And they should match, "a' is not a proper string.

A string surrounded by quotes is \"[^\"]*\"|'[^']*'. A string without quotes preceding a comment is [^#]*.

This makes our total regex:

(\"[^\"]*\"|'[^']*'|[^#]*)\s*#(.*)

Upvotes: 1

CertainPerformance
CertainPerformance

Reputation: 370699

You should capture the initial quote (if any) and then use a backreference later to ensure that if the first '" was matched, '" is required at the end of the match before the #:

([\"']?)(.*?)\1\s*(#.*)

https://regex101.com/r/Rpb5wL/1

(note that since the initial quote is now captured, you'll have to change the code that uses the resulting groups to account for that - eg, the # part will now be in the 3rd group, not the 2nd group)

Upvotes: 2

Related Questions