sadfsa sdfasdf
sadfsa sdfasdf

Reputation: 69

Regex to match variable declaration

I need to match something like this:

int a= 4, b, c = "hi";

I already made a regex that successfully strips everything away from the line, leaving only

a= 4, b, c = "hi"

I don't care about the types of the variables, like "hi" being a String, because that will be checked later in the code.

Basically, I need to match a variable declaration with everything stripped off except the variables themselves, with or without the = part.

These are examples that should not match:

a b= 4
var,
,hello=3
=8

I have checked this question out, it didn't really help.

I have tried this code, but there are a couple of problems, namely pretty much everything that I have listed in the things that shouldn't match, do match.

Also there might be more things that I missed. I am supposed to match strings with spaces, for example a = "hello there", and there isn't a requirement to match a string with , inside it.

"Formal" defenition of what a variable name can be:

Variable name can be any sequence (length > 0) of letters (uppercase or lowercase), digits and the underscore character. Name may not start with a digit. Name may start with an underscore, but in such a case it must contain at least one more character

Thanks for the help!

Upvotes: 1

Views: 3534

Answers (1)

Ro Yo Mi
Ro Yo Mi

Reputation: 15010

Description

Taking from your regex101 example, I'm not exactly clear on the other requirements so I realize this may not completely answer your question.

"[^"]*"|((?=_[a-z_0-9]|[a-z])[a-z_0-9]+(?=\s*=))

Regular expression visualization

This regular expression will do the following:

  • matches quoted strings
  • places the variable names into Capture Group 1, you can then iterate through the array of matches testing the capture group 1 for a value, if it's populated then it's a name.
  • requires variable name to start with either _ and at least one character, or start with an a-z
  • after the first letter the variable names can contain any number of a-z _ or 0-9
  • variables names must be followed by an = sign
  • any number of spaces can be around the = sign

Example

Live Demo

https://regex101.com/r/aT6sC4/1

Sample text

name = "steve", bro = "4, hi = bye", lolwot = "wait wot"

Sample Matches

Note how capture group 1 only contains the variable names.

[0][0] = name
[0][1] = name

[1][0] = "steve"
[1][1] = 

[2][0] = bro
[2][1] = bro

[3][0] = "4, hi = bye"
[3][1] = 

[4][0] = lolwot
[4][1] = lolwot

[5][0] = "wait wot"
[5][1] = 

Explanation

NODE                     EXPLANATION
----------------------------------------------------------------------
  "                        '"'
----------------------------------------------------------------------
  [^"]*                    any character except: '"' (0 or more times
                           (matching the most amount possible))
----------------------------------------------------------------------
  "                        '"'
----------------------------------------------------------------------
 |                        OR
----------------------------------------------------------------------
  (                        group and capture to \1:
----------------------------------------------------------------------
    (?=                      look ahead to see if there is:
----------------------------------------------------------------------
      _                        '_'
----------------------------------------------------------------------
      [a-z_0-9]                any character of: 'a' to 'z', '_', '0'
                               to '9'
----------------------------------------------------------------------
     |                        OR
----------------------------------------------------------------------
      [a-z]                    any character of: 'a' to 'z'
----------------------------------------------------------------------
    )                        end of look-ahead
----------------------------------------------------------------------
    [a-z_0-9]+               any character of: 'a' to 'z', '_', '0'
                             to '9' (1 or more times (matching the
                             most amount possible))
----------------------------------------------------------------------
    (?=                      look ahead to see if there is:
----------------------------------------------------------------------
      \s*                      whitespace (\n, \r, \t, \f, and " ")
                               (0 or more times (matching the most
                               amount possible))
----------------------------------------------------------------------
      =                        '='
----------------------------------------------------------------------
    )                        end of look-ahead
----------------------------------------------------------------------
  )                        end of \1
----------------------------------------------------------------------

Upvotes: 2

Related Questions