vamsiampolu
vamsiampolu

Reputation: 6622

Regex: Match all hyphens or underscores not at the beginning or the end of the string

I am writing some code that needs to convert a string to camel case. However, I want to allow any _ or - at the beginning of the code.

I have had success matching up an _ character using the regex here:

^(?!_)(\w+)_(\w+)(?<!_)$

when the inputs are:

pro_gamer #matched
#ignored
_proto 
proto_
__proto
proto__
__proto__
#matched as nerd_godess_of, skyrim
nerd_godess_of_skyrim

I recursively apply my method on the first match if it looks like nerd_godess_of.

I am having troubled adding - matches to the same, I assumed that just adding a - to the mix like this would work:

^(?![_-])(\w+)[_-](\w+)(?<![_-])$

and it matches like this:

super-mario #matched
eslint-path #matched
eslint-global-path #NOT MATCHED.

I would like to understand why the regex fails to match the last case given that it worked correctly for the _.

The (almost) full set of test inputs can be found here

Upvotes: 2

Views: 2128

Answers (3)

Cary Swoveland
Cary Swoveland

Reputation: 110685

The fact that

^(?![_-])(\w+)[_-](\w+)(?<![_-])$

does not match the second hyphen in "eslint-global-path" is because of the anchor ^ which limits the match to be on the first hyphen only. This regex reads, "Match the beginning of the line, not followed by a hyphen or underscore, then match one or more words characters (including underscores), a hyphen or underscore, and then one or more word characters in a capture group. Lastly, do not match a hyphen or underscore at the end of the line."

The fact that an underscore (but not a hyphen) is a word (\w) character completely messes up the regex. In general, rather than using \w, you might want to use \p{Alpha} or \p{Alnum} (or POSIX [[:alpha:]] or [[:alnum:]]).

Try this.

r = /
    (?<=     # begin a positive lookbehind
      [^_-]  # match a character other than an underscore or hyphen
    )        # end positive lookbehind
    (        # begin capture group 1
      (?:    # begin a non-capture group
        -+   # match one or more hyphens
        |    # or
        _+   # match one or more underscores
      )      # end non-capture group
      [^_-]  # match any character other than an underscore or hyphen
    )        # end capture group 1
    /x       # free-spacing regex definition mode

'_cats_have--nine_lives--'.gsub(r) { |s| s[-1].upcase }
  #=> "_catsHaveNineLives--"

This regex is conventionally written as follows.

r = /(?<=[^_-])((?:-+|_+)[^_-])/

If all the letters are lower case one could alternatively write

'_cats_have--nine_lives--'.split(/(?<=[^_-])(?:_+|-+)(?=[^_-])/).
  map(&:capitalize).join
  #=> "_catsHaveNineLives--"

where

'_cats_have--nine_lives--'.split(/(?<=[^_-])(?:_+|-+)(?=[^_-])/)
  #=> ["_cats", "have", "nine", "lives--"]

(?=[^_-]) is a positive lookahead that requires the characters on which the split is made to be followed by a character other than an underscore or hyphen

Upvotes: 4

marvel308
marvel308

Reputation: 10458

you can try the regex

^(?=[^-_])(\w+[-_]\w*)+(?=[^-_])\w$

see the demo here.

Upvotes: 0

Mischa
Mischa

Reputation: 2298

Switch _- to -_ so that - is not treated as a range op, as in a-z.

Upvotes: -1

Related Questions