Jamzy
Jamzy

Reputation: 159

Identify an address number in a string

I have a list of addresses, currently quite unclean. They take the format:

955 - 959 Fake Street
95-99 Fake Street
4-9 M4 Ln
95 - 99 Fake Street
99 Fake Street

What I would like to do is split up the street name and street number. I need a regex expression that is true for

955 - 959
95-99
4-9
95 - 99
99

I currently have this:

^[0-9][0-9]\s*+(\s*-\s*[0-9][0-9]+)

which works for the two digit addresses but does not work for the three or one digit addresses.

Thanks

Upvotes: 2

Views: 246

Answers (5)

Harshit Garg
Harshit Garg

Reputation: 2259

Another way could be

In [83]: s = '955 - 959 Fake Street'

In [84]: s1 = '95-99 Fake Street'

In [85]: s2 = '95 - 99 Fake Street'

In [86]: s3 = '99 Fake Street'

In [87]: d = re.search(r'^[0-9]+[ ]*(-[ ]*[0-9]+){0,1}', s3)

In [88]: d.group()
Out[88]: '99 '

In [89]: d = re.search(r'^[0-9]+[ ]*(-[ ]*[0-9]+){0,1}', s2)

In [90]: d.group()
Out[90]: '95 - 99'

In [91]: d = re.search(r'^[0-9]+[ ]*(-[ ]*[0-9]+){0,1}', s1)

In [92]: d.group()
Out[92]: '95-99'

In [93]: d = re.search(r'^[0-9]+[ ]*(-[ ]*[0-9]+){0,1}', s)

In [94]: d.group()
Out[94]: '955 - 959'

the character set 0-9 cab be represented by \d like this

d = re.search(r'^[\d]+[ ]*(-[ ]*[\d]+){0,1}', s)

Here, in all the examples, we are searching at the beginning of the string, for a sequence of at least one digit followed by zero or more spaces and optionally followed by at most one sequence of only one - symbol followed by zero or more spaces and at least one or more digits.

Upvotes: 0

dawg
dawg

Reputation: 103814

For your example, you can do:

/^(\d+[-\s\d]*)\s/gm

Demo

Explanation:

/^(\d+[-\s\d]*)\s/gm
 ^                      start of line
    ^                   at least 1 digit and as many digits as possible
       ^                any character of the set -, space, digit
             ^          zero or more
                ^       trailing space
                    ^   multiline for the ^ start of line assertion

Upvotes: 2

Ignacio Catalina
Ignacio Catalina

Reputation: 197

Starting from your regex:

^[0-9][0-9]\s*+(\s*-\s*[0-9][0-9]+)

You got an extra white space matcher in the second block:

^[0-9][0-9]\s*+(-\s*[0-9][0-9]+)

I would suggest you replace [0-9] with \d

^[\d][\d]\s*+(-\s*[\d][\d]+)

Use a + instead o 2 copies of \d meaning at least one number:

^[\d]+\s*+(-\s*[\d]+)

Make the last block optional, so it matches 99 Fake Address:

^[\d]+\s*+(-\s*[\d]+)?

If you know there's only going to be 1 white space, you could replace \s* with \s?:

^[\d]+\s?(-\s?[\d]+)?

That should match all of them :D

Upvotes: 2

court3nay
court3nay

Reputation: 2365

You can use braces {2,3} for 2-3 numbers - but also *+ isn't right.

/^(([0-9]{1,3}\s-\s)?[0-9]{1,3})\s/

I nested the braces so you only want the first result from the regex.

it breaks up like this

([0-9]{1,3}\s-\s)?

first, Is there a 1-3 digit number with a space-dash-space - OPTIONAL

then.. does it end in a 1-3 digit number followed by a space.

Upvotes: 2

A. L
A. L

Reputation: 12649

I'm not sure what you're trying to do here \s*+ but you basically had the answer with the last part [0-9][0-9]+ that would find 2+ digits on the end.

Maybe try this (it's more concise). This searches for 1+ digits instead of 2+

\d+(\s*-\s*\d+)?

Upvotes: 4

Related Questions