Reputation: 167
import re
def street_regex(street):
street_regex = ""
regex = re.compile("^(\p{L}[\p{L} -]*\p{L}(?: \d{1,4}(?: ?[A-Za-z])?)?\b)")
s = regex.search(street)
if s:
street_regex = s.group()
else:
street_regex = street
return street_regex
So that is my code. From one of my previous posts on here I got the regex that I'm using in my code. However if I call my function then the regex wont work and I don't get what i want. (See the previous post to understand what I mean). I'm using Python 3.4 if that helps.
Upvotes: 1
Views: 61
Reputation: 6511
The re module
does not support Unicode properties. However, if you set the re.UNICODE
flag, \w
matches alphanumerics from all scripts. Hence, [^\W\d_]
matches only letters, as the intended \p{L}
.
\W
matches non-word characters (excluding the Letter category
, the Number category
, and "_
")\d
matches digits included in the Number category
[^\W\d_]
will match anything EXCEPT non-word characters, digits or "_
"... which means it will only match lettersCode:
#python 3.4.3
import re
str = u"Stréêt -Name 123S"
r = re.compile(r'^([^\W\d_](?:[^\W\d_]|[- ])*[^\W\d_](?: [0-9]{1,4}(?: ?[A-Za-z])?)?\b)', re.UNICODE)
s = r.search(str)
print(s.group())
Alternatively, you can use the regex module
, with added support for Unicode properties
Upvotes: 0
Reputation: 174706
You need to use regex
module. Your regex is correct but the python's default regex module re
won't support these \p{L}
, \p{N}
kind of pcre regex patterns. You may use [a-zA-Z]
instead of \p{L}
with re
but it must support english alphabets only not any kind of letter from any language (\p{L}
).
>>> import regex
>>> re.search(r'\p{L}+', 'foo')
>>> regex.search(r'\p{L}+', 'foo')
<regex.Match object; span=(0, 3), match='foo'>
>>>
Upvotes: 1