ZeZe
ZeZe

Reputation: 167

Is the regex wrong or is it my code?

import re

def street_regex(street):
    street_regex = ""

    regex = re.compile("^(\p{L}[\p{L} -]*\p{L}(?: \d{1,4}(?: ?[A-Za-z])?)?\b)")
    s = regex.search(street)

    if s:
        street_regex = s.group()
    else:
        street_regex = street

    return street_regex

So that is my code. From one of my previous posts on here I got the regex that I'm using in my code. However if I call my function then the regex wont work and I don't get what i want. (See the previous post to understand what I mean). I'm using Python 3.4 if that helps.

Upvotes: 1

Views: 61

Answers (2)

Mariano
Mariano

Reputation: 6511

The re module does not support Unicode properties. However, if you set the re.UNICODE flag, \w matches alphanumerics from all scripts. Hence, [^\W\d_] matches only letters, as the intended \p{L}.

  • \W matches non-word characters (excluding the Letter category, the Number category, and "_")
  • \d matches digits included in the Number category
  • So [^\W\d_] will match anything EXCEPT non-word characters, digits or "_"... which means it will only match letters

Code:

#python 3.4.3
import re

str = u"Stréêt -Name 123S"
r = re.compile(r'^([^\W\d_](?:[^\W\d_]|[- ])*[^\W\d_](?: [0-9]{1,4}(?: ?[A-Za-z])?)?\b)', re.UNICODE)
s = r.search(str)
print(s.group())

Run this code online


Alternatively, you can use the regex module, with added support for Unicode properties

Upvotes: 0

Avinash Raj
Avinash Raj

Reputation: 174706

You need to use regex module. Your regex is correct but the python's default regex module re won't support these \p{L}, \p{N} kind of pcre regex patterns. You may use [a-zA-Z] instead of \p{L} with re but it must support english alphabets only not any kind of letter from any language (\p{L}).

>>> import regex
>>> re.search(r'\p{L}+', 'foo')
>>> regex.search(r'\p{L}+', 'foo')
<regex.Match object; span=(0, 3), match='foo'>
>>> 

Upvotes: 1

Related Questions