Vivek Jha
Vivek Jha

Reputation: 1580

Python Regex For Matching Multiword Names

I want to match only alphabetic characters, i.e a-z or A-Z, which can also contain spaces. The intent is to match any multiword names like 'Vivek Jha'. I expect the following Regex to work:

re.match(r'^[aA-zZ\s]+$', name)

It works for all the cases but also matches a word: 'Vivek_Jha'

I do not want and underscore to be matched. How is this _ getting matched.

I have worked on Regex in Perl and Tcl, but I think Python is doing something more that I can imagine.

Upvotes: 2

Views: 9260

Answers (3)

Kasravnd
Kasravnd

Reputation: 107287

If you want to match only alphabetic characters,which can also contain spaces just use :

r'^[a-zA-Z ]+$'

note that aA-zZ is wrong way for match letters you must use a-z for lowercase and A-Z for upper case . Note :

The \s metacharacter is used to find a whitespace character.

A whitespace character can be:

A space character
A tab character
A carriage return character
A new line character
A vertical tab character
A form feed character

Upvotes: 4

Tomty
Tomty

Reputation: 2022

Try a-zA-Z instead of aA-zZ.

a-z have nothing between them but letters, same for A-Z, but A-z have a lot of stuff in between... apparently including the underscore character.

Upvotes: 2

user2555451
user2555451

Reputation:

A-z is capturing everything from ASCII character A to ASCII character z. This includes the _ character as well as many others. For more information on this, you can view Wikipedia's ASCII article.

To fix the problem, you need to do:

re.match(r'[a-zA-Z\s]+$', name)

This tells Python to only capture characters in the ASCII ranges a-z and A-Z.

Also, I removed the ^ because re.match matches from the start of the string by default.

Upvotes: 6

Related Questions