Flare
Flare

Reputation: 83

How to handle spaces in values if you use spaces as a delimiter with regex?

I'm attempting to run a regex to capture the key and value of the following string:

name="Evoke Sprite" parent="EvokeObjects" instance=ExtResource( 5 ) id=5

Here are some syntax notes for each are as follows:

I've gotten as far as having spaces within quotes with this:

(.*?)=(?:"(.*?)"|(.*?))(?: |$)

So this will work with name="Evoke Sprite" parent="EvokeObjects" id=5

regex101 to test: https://regex101.com/r/xkRRsD/1

The problem occurs when I add the ExtResource( 5 ) because it has the space within the brackets. Then the previous regex code fails.

As a possible workaround I was thinking maybe I could remove the spaces altogether from the brackets by doing a string replace in code. But I was wondering if there was a regex solution to this?

Upvotes: 2

Views: 97

Answers (3)

The fourth bird
The fourth bird

Reputation: 163457

In the second part of the alternation, you match until a space or the end of the string so that would match ExtResource(

What you could do is either match not a parenthesis or match from an opening till a closing parenthesis.

Instead of using non greedy quantifiers, you might use a negated character class.

([^=\s]+)=(?:"([^"]+)"|((?:[^\s()"]|\([^()]*\))+))

Explanation

  • ([^=]+)= Capture group 1, match any char except =, then match =
  • (?: Non capturing groups
    • "([^"]*)" Match ", then capture any char except " in group 2, then match "
    • | Or
    • ( Capture group 3
      • (?: Non capturing group
        • [^\s()"] Match any char except (, ), " or a whitespace char
        • | Or
        • \([^()]*\) Match from opening till closing parenthesis
      • )+ Close non capturing group and repeat 1+ times
    • ) Close group 3
  • ) Close non capturing group

Regex demo

Upvotes: 2

Nick Reed
Nick Reed

Reputation: 5059

Edit: v5, this should hit all of @Andreas's test cases.

Looks like your regex is quite close, but the last statement in your non-capturing group, (.*?), is going to regard the space after the open parenthesis as the "end" of its search, since it consumes as few characters as possible before it hits a space. Given that you know the function string will have spaces between the parenthesis, this regex seems to do the trick:

(\S*?)=(?:"(.*?)"|(\S*?\(.*?\))|(\S*?))(?: |$)

Critically, \S matches any non-whitespace character - since there's never going to be an example like id=some val, this is a good option to use, as it won't run over the parenthesis in functions. It also makes sure that the key name has no spaces, like pare nt=val.

Try it here!

Upvotes: 1

Code Maniac
Code Maniac

Reputation: 37745

You can use

([a-z]+)=(?:"(.*?)"|(.*?))(?:(?=[a-z]+?=)|$)

enter image description here

Regex Demo

Upvotes: 0

Related Questions