epochwolf
epochwolf

Reputation: 12772

Regex: Match a string containing numbers and letters but not a string of just numbers

Question

I would like to be able to use a single regex (if possible) to require that a string fits [A-Za-z0-9_] but doesn't allow:

Valid

Invalid

Reasons for the Rules

The purpose of this is to filter usernames for a website I'm working on. I've arrived at the rules for specific reasons.

Upvotes: 3

Views: 5329

Answers (9)

Amber
Amber

Reputation: 526573

This matches exactly what you want:

/\A(?!_)(?:[a-z0-9]_?)*[a-z](?:_?[a-z0-9])*(?<!_)\z/i
  1. At least one alphabetic character (the [a-z] in the middle).
  2. Does not begin or end with an underscore (the (?!_) and (?<!_) at the beginning and end).
  3. May have any number of numbers, letters, or underscores before and after the alphabetic character, but every underscore must be separated by at least one number or letter (the rest).

Edit: In fact, you probably don't even need the lookahead/lookbehinds due to how the rest of the regex works - the first ?: parenthetical won't allow an underscore until after an alphanumeric, and the second ?: parenthetical won't allow an underscore unless it's before an alphanumeric:

/\A(?:[a-z0-9]_?)*[a-z](?:_?[a-z0-9])*\z/i

Should work fine.

Upvotes: 8

glenn mcdonald
glenn mcdonald

Reputation: 15488

The question asks for a single regexp, and implies that it should be a regexp that matches, which is fine, and answered by others. For interest, though, I note that these rules are rather easier to state directly as a regexp that should not match. I.e.:

x !~ /[^A-Za-z0-9_]|^_|_$|__|^\d+$/
  • no other characters than letters, numbers and _
  • can't start with a _
  • can't end with a _
  • can't have two _s in a row
  • can't be all digits

You can't use it this way in a Rails validates_format_of, but you could put it in a validate method for the class, and I think you'd have much better chance of still being able to make sense of what you meant, a month or a year from now.

Upvotes: 1

Alan Moore
Alan Moore

Reputation: 75222

/^(?![\d_]+$)[A-Za-z0-9]+(?:_[A-Za-z0-9]+)*$/

Your question is essentially the same as this one, with the added requirement that at least one of the characters has to be a letter. The negative lookahead - (?![\d_]+$) - takes care of that part, and is much easier (both to read and write) than incorporating it into the basic regex as some others have tried to do.

Upvotes: 0

Tim
Tim

Reputation: 9172

This doesn't block "__", but it does get the rest:

([A-Za-z]|[0-9][0-9_]*)([A-Za-z0-9]|_[A-Za-z0-9])*

And here's the longer form that gets all your rules:

([A-Za-z]|([0-9]+(_[0-9]+)*([A-Za-z|_[A-Za-z])))([A-Za-z0-9]|_[A-Za-z0-9])*

dang, that's ugly. I'll agree with Telemachus, that you probably shouldn't do this with one regex, even though it's technically possible. regex is often a pain for maintenance.

Upvotes: 1

Sinan Taifour
Sinan Taifour

Reputation: 10795

What about:

/^(?=[^_])([A-Za-z0-9]+_?)*[A-Za-z](_?[A-Za-z0-9]+)*$/

It doesn't use a back reference.

Edit:

Succeeds for all your test cases. Is ruby compatible.

Upvotes: 2

Welbog
Welbog

Reputation: 60398

(?=.*[a-zA-Z].*)^[A-Za-z0-9](_?[A-Za-z0-9]+)*$

This one works.

Look ahead to make sure there's at least one letter in the string, then start consuming input. Every time there is an underscore, there must be a number or a letter before the next underscore.

Upvotes: 0

Rado
Rado

Reputation: 8963

Here you go:

^(([a-zA-Z]([^a-zA-Z0-9]?[a-zA-Z0-9])*)|([0-9]([^a-zA-Z0-9]?[a-zA-Z0-9])*[a-zA-Z]+([^a-zA-Z0-9]?[a-zA-Z0-9])*))$

If you want to restrict the symbols you want to accept, simply change all [^a-zA-Z0-9] with [] containing all allowed symbols

Upvotes: 0

rpjohnst
rpjohnst

Reputation: 1632

[A-Za-z][A-Za-z0-9_]*[A-Za-z]

That would work for your first two rules (since it requires a letter at the beginning and end for the second rule, it automatically requires letters).

I'm not sure the third rule is possible using regexes.

Upvotes: -2

Telemachus
Telemachus

Reputation: 19705

I'm sure that you could put all this into one regular expression, but it won't be simple and I'm not sure why insist on it being one regex. Why not use multiple passes during validation? If the validation checks are done when users create a new account, there really isn't any reason to try to cram it into one regex. (That is, you will only be dealing with one item at a time, not hundreds or thousands or more. A few passes over a normal sized username should take very little time, I would think.)

First reject if the name doesn't contain at least one number; then reject if the name doesn't contain at least one letter; then check that the start and end are correct; etc. Each of those passes could be a simple to read and easy to maintain regular expression.

Upvotes: 2

Related Questions