Pablo
Pablo

Reputation:

Understanding regex criteria in pattern match

I am trying to determine what the following pattern match criteria allows me to enter:

\s*([\w\.-]+)\s*=\s*('[^']*'|"[^"]*"|[^\s]+)

From my attempt to decipher (by looking at the regex's I do understand) it seems to say I can start with any character sequence then I must have a brace followed by alphanumerics, then another sequence followed by braces, one intial single quote, no backslashes closed by a brace ???

Sorry if I have got this completely muddled. Any help is appreciated.

Regards, Pablo

Upvotes: 1

Views: 618

Answers (5)

Alan Moore
Alan Moore

Reputation: 75242

Yes, you have got it completely muddled. :P For one thing, there are no braces in that regex; that word usually refers to the curly brackets: {}. That regex only contains square brackets and parentheses (aka round brackets), and they're all regex metacharacters--they aren't meant to match those characters literally. The same goes for most of the other characters.

You might find this site useful. Very good tutorial and reference site for all things regex.

Upvotes: 0

too much php
too much php

Reputation: 91038

It's looking for strings of text which are basically

<identifier> = <value>
  • identifier is made up of letters, digits, '-' and '.'

  • value can be a single-quoted strings, double-quoted strings, or any other sequence of characters (as long as it doesn't contain a space).

So it would match lines that look like this:

foo = 1234
bar-bar= "a double-quoted string"
bar.foo-bar ='a single quoted string'
   .baz      =stackoverflow.com this part is ignored

Some things to note:

  • There's no way to put a quote inside a quoted string (such as using \" inside "...").
  • Anything after the quoted string is ignored.
  • If a quoted string isn't used for value, then everything from the first space onwards is ignored.
  • Whitespace is optional

Upvotes: 1

Let us break \s*([\w\.-]+)\s*=\s*('[^']*'|\"[^\"]*\"|[^\s]+) apart:

\s*([\w\.-]+)\s*:

  • \s* means 0 or more whitespace characters
  • `[\w.-]+ means 1 or more of the following characters: A-Za-z0-9_.-

('[^']*'|\"[^\"]*\"|[^\s]+):

  • One or more characters non-' characters enclosed in ' and '.
  • One or more characters non-" characters enclodes in " and ".
  • One or more characters not containing a space

So basically, you can mostly ignore the \s*'s in trying to understand the expression, they just handle removing spacing.

Upvotes: 0

Laurence Gonsalves
Laurence Gonsalves

Reputation: 143274

The square brackets are character classes, and the parens are for grouping. I'm not sure what you mean by "braces".

This basically matches a name=value pair where than name consists of one or more "word", dot or hyphen characters, and the value is either a single quoted character or a double-quoted string of characters, or a bunch of non-whitespace characters. Single-quoted characters cannot contain a single quote, and double quoted strings may not contain double-quotes (both arguably minor flaws whatever syntax this is from). There's also arguably some ambiguity since the last option ("a bunch on non-whitespace characters") could match something starting with a single or double quote.

Also, zero or more whitespaces may appear around the equal sign or at the beginning (that's the \s* bits).

Upvotes: 2

Scott Evernden
Scott Evernden

Reputation: 39966

RegexBuddy says:

\s*([\w\.-]+)\s*=\s*('[^']*'|"[^"]*"|[^\s]+)

Options: case insensitive

Match a single character that is a “whitespace character” (spaces, tabs, line breaks, etc.) «\s*»
   Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
Match the regular expression below and capture its match into backreference number 1 «([\w\.-]+)»
   Match a single character present in the list below «[\w\.-]+»
      Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
      A word character (letters, digits, etc.) «\w»
      A . character «\.»
      The character “-” «-»
Match a single character that is a “whitespace character” (spaces, tabs, line breaks, etc.) «\s*»
   Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
Match the character “=” literally «=»
Match a single character that is a “whitespace character” (spaces, tabs, line breaks, etc.) «\s*»
   Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
Match the regular expression below and capture its match into backreference number 2 «('[^']*'|"[^"]*"|[^\s]+)»
   Match either the regular expression below (attempting the next alternative only if this one fails) «'[^']*'»
      Match the character “'” literally «'»
      Match any character that is NOT a “'” «[^']*»
         Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
      Match the character “'” literally «'»
   Or match regular expression number 2 below (attempting the next alternative only if this one fails) «"[^"]*"»
      Match the character “"” literally «"»
      Match any character that is NOT a “"” «[^"]*»
         Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
      Match the character “"” literally «"»
   Or match regular expression number 3 below (the entire group fails if this one fails to match) «[^\s]+»
      Match a single character that is a “non-whitespace character” «[^\s]+»
         Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»


Created with RegexBuddy

Upvotes: 0

Related Questions