Is the regex wrong or is it my code?

Question

import re

def street_regex(street):
    street_regex = ""

    regex = re.compile("^(\p{L}[\p{L} -]*\p{L}(?: \d{1,4}(?: ?[A-Za-z])?)?\b)")
    s = regex.search(street)

    if s:
        street_regex = s.group()
    else:
        street_regex = street

    return street_regex

So that is my code. From one of my previous posts on here I got the regex that I'm using in my code. However if I call my function then the regex wont work and I don't get what i want. (See the previous post to understand what I mean). I'm using Python 3.4 if that helps.

Mariano · Accepted Answer

The re module does not support Unicode properties. However, if you set the re.UNICODE flag, \w matches alphanumerics from all scripts. Hence, [^\W\d_] matches only letters, as the intended \p{L}.

\W matches non-word characters (excluding the Letter category, the Number category, and "_")
\d matches digits included in the Number category
So [^\W\d_] will match anything EXCEPT non-word characters, digits or "_"... which means it will only match letters

Code:

#python 3.4.3
import re

str = u"Stréêt -Name 123S"
r = re.compile(r'^([^\W\d_](?:[^\W\d_]|[- ])*[^\W\d_](?: [0-9]{1,4}(?: ?[A-Za-z])?)?\b)', re.UNICODE)
s = r.search(str)
print(s.group())

Run this code online

Alternatively, you can use the regex module, with added support for Unicode properties

Is the regex wrong or is it my code?

Answers (2)

Related Questions