perrywinkle
perrywinkle

Reputation: 383

Ruby regex counting characters

I am trying to create a regex in ruby that matches against strings with 10 characters which are not special characters i.e. would match with \w. So far I have come up with this: /\w{10,}/ but the issue is that it will only count a consecutive sequence of word characters. I want to match any string which counts up to have at least 10 "word" characters. Is this possible? I am fairly new to regex as a whole so any help would be appreciated.

Upvotes: 0

Views: 330

Answers (2)

BroiSatse
BroiSatse

Reputation: 44685

If I understood correctly, this should work:

/(?:\w[^\w]*){9,}\w/

Explanation:

We start with a single

\w

We want to capture all the other characters until another \w, hence:

\w[^\w]*

[^<list of chars>] matches any character other than listed in the brackets, so [^\w] means any character that is not a word character. * denotes 0 or more. The above will match "a-- ", "b" and "c!" in "a-- bc!" string.

Since we need 10 \w, we will match 9 (or more) groups like that, followed by a single \w

(\w[^\w]*){9,}\w

We don't really care for captures here (especially since ruby will ignore repeated group captures anyway, so we make the group non-capturing)

(?:\w[^\w]*){9,}\w

Alternatively we could just use simpler regex:

(?:\w[^\w]*){10,}

But it will also cover characters after the last word character in a string - not sure if this is required here.

Upvotes: 4

Ryszard Czech
Ryszard Czech

Reputation: 18611

Match anywhere in the string:

/\w(?:\W*\w){9,19}/
/(?:\W*\w){10,20}/

Validate a string of 10 to 20 characters long:

/\A(?:\W*\w){10,20}\W*\z/

Prefer non-capturing groups, particularly when extracting found matches.

Watch out for ^ and $ that mark up start and end of the line respectively in Ruby's regex.

EXPLANATION

--------------------------------------------------------------------------------
  \A                       the beginning of the string
--------------------------------------------------------------------------------
  (?:                      group, but do not capture (between 10 and
                           20 times (matching the most amount
                           possible)):
--------------------------------------------------------------------------------
    \W*                      non-word characters (all but a-z, A-Z, 0-
                             9, _) (0 or more times (matching the
                             most amount possible))
--------------------------------------------------------------------------------
    \w                      word characters (a-z, A-Z, 0-9, _) 
--------------------------------------------------------------------------------
  ){10,20}                 end of grouping
--------------------------------------------------------------------------------
  \W*                      non-word characters (all but a-z, A-Z, 0-
                           9, _) (0 or more times (matching the most
                           amount possible))
--------------------------------------------------------------------------------
  \z                       the end of the string

Upvotes: 1

Related Questions