Reputation: 183
I'm working with a script that would determine if my string would be a valid variable. It's very basic but I can`t seem to figure out how to use regular expression.
So basically I want:
A-Z
a-z
0-9
no whitespace anywhere
no special char except _
Is that possible ? This is what I tried:
re.match("[a-zA-Z0-9_,/S]*$", char_s):
Upvotes: 0
Views: 86
Reputation: 60127
The correct methods:
Python 2
import re
import keyword
import tokenize
re.match(tokenize.Name+"$", char_s) and not keyword.iskeyword(char_s)
Python 3
import keyword
char_s.isidentifier() and not keyword.iskeyword(char_s)
Note that Python 2's method silently fails on Python 3.
When you see these kind of questions the first thing you should ask is "how does Python do it?" because almost all of the time it exposes a method to the user.
Upvotes: 1
Reputation: 14089
Well on top of the regular expressions mentioned you need to make sure it is not one of the reserved keywords :
and del from not while
as elif global or with
assert else if pass yield
break except import print
class exec in raise
continue finally is return
def for lambda try
So something like this :
reserved = ["and", "del", "from", "not", "while", "as", "elif", "global", "or", "with", "assert", "else", "if", "pass", "yield", "break", "except", "import", "print", "class", "exec", "in", "raise", "continue", "finally", "is", "return", "def", "for", "lambda", "try"]
def is_valid(keyword):
return (keyword not in reserved and
re.match(r"^(?!\d)\w+$", keyword) # from p.s.w.g answer
Or like @nofinator suggests you can and should probably just use keyword.iskeyword()
.
Upvotes: 3
Reputation: 149000
A pattern like this should work:
^[a-zA-Z_][a-zA-Z0-9_]*$
Or more simply:
^(?!\d)\w+$
In both cases, it will match a string which consists of one or more letters, digits or underscores as long it doesn't start with a digit.
The (?!…)
in the second pattern is a negative look-ahead assertion. It ensures the first character is not a digit. More information can be found in the manual.
Upvotes: 4
Reputation: 361585
re.match(r"^[^\W\d]\w*$", char_s):
The word \w
character class is equivalent to [a-zA-Z0-9_]
. Identifiers cannot start with a digit, so match [^\W\d]
for the first character and \w*
for the rest of them.
Upvotes: 1