hazrmard
hazrmard

Reputation: 3661

How to match all unicode alphabetic characters and spaces in a regex?

I am trying to validate place names in python 3/ django forms. I want to get matches with strings like: Los Angeles, Canada, 中国, and Россия. That is, the string contains:

The pattern I am currently using is r'^[^\W\d]+$' as suggested in How to match alphabetical chars without numeric chars with Python regexp?. However it only seems to match like the pattern r'^[a-zA-Z]+$. That is, Россия, Los Angeles and 中国 do not match , only Canada does.

An example of my code:

import re
re.search(r'^[^\W\d]+$', 'Россия')

Which returns nothing.

Upvotes: 2

Views: 3541

Answers (1)

Mark Tolonen
Mark Tolonen

Reputation: 177675

Your example works for me, but will find underscores and not spaces. This works:

>>> re.search(r'^(?:[^\W\d_]| )+$', 'Los Angeles')
<_sre.SRE_Match object at 0x0000000003C612A0>
>>> re.search(r'^(?:[^\W\d_]| )+$', 'Россия')
<_sre.SRE_Match object at 0x0000000003A0D030>
>>> re.search(r'^(?:[^\W\d_]| )+$', 'Los_Angeles') # not found
>>> re.search(r'^(?:[^\W\d_]| )+$', '中国')
<_sre.SRE_Match object at 0x0000000003C612A0>

Upvotes: 3

Related Questions