Reputation: 159
I have a list of addresses, currently quite unclean. They take the format:
955 - 959 Fake Street
95-99 Fake Street
4-9 M4 Ln
95 - 99 Fake Street
99 Fake Street
What I would like to do is split up the street name and street number. I need a regex expression that is true for
955 - 959
95-99
4-9
95 - 99
99
I currently have this:
^[0-9][0-9]\s*+(\s*-\s*[0-9][0-9]+)
which works for the two digit addresses but does not work for the three or one digit addresses.
Thanks
Upvotes: 2
Views: 246
Reputation: 2259
Another way could be
In [83]: s = '955 - 959 Fake Street'
In [84]: s1 = '95-99 Fake Street'
In [85]: s2 = '95 - 99 Fake Street'
In [86]: s3 = '99 Fake Street'
In [87]: d = re.search(r'^[0-9]+[ ]*(-[ ]*[0-9]+){0,1}', s3)
In [88]: d.group()
Out[88]: '99 '
In [89]: d = re.search(r'^[0-9]+[ ]*(-[ ]*[0-9]+){0,1}', s2)
In [90]: d.group()
Out[90]: '95 - 99'
In [91]: d = re.search(r'^[0-9]+[ ]*(-[ ]*[0-9]+){0,1}', s1)
In [92]: d.group()
Out[92]: '95-99'
In [93]: d = re.search(r'^[0-9]+[ ]*(-[ ]*[0-9]+){0,1}', s)
In [94]: d.group()
Out[94]: '955 - 959'
the character set 0-9
cab be represented by \d
like this
d = re.search(r'^[\d]+[ ]*(-[ ]*[\d]+){0,1}', s)
Here, in all the examples, we are searching at the beginning of the string, for a sequence of at least one digit followed by zero or more spaces and optionally followed by at most one sequence of only one -
symbol followed by zero or more spaces and at least one or more digits.
Upvotes: 0
Reputation: 103814
For your example, you can do:
/^(\d+[-\s\d]*)\s/gm
Explanation:
/^(\d+[-\s\d]*)\s/gm
^ start of line
^ at least 1 digit and as many digits as possible
^ any character of the set -, space, digit
^ zero or more
^ trailing space
^ multiline for the ^ start of line assertion
Upvotes: 2
Reputation: 197
Starting from your regex:
^[0-9][0-9]\s*+(\s*-\s*[0-9][0-9]+)
You got an extra white space matcher in the second block:
^[0-9][0-9]\s*+(-\s*[0-9][0-9]+)
I would suggest you replace [0-9]
with \d
^[\d][\d]\s*+(-\s*[\d][\d]+)
Use a +
instead o 2 copies of \d
meaning at least one number:
^[\d]+\s*+(-\s*[\d]+)
Make the last block optional, so it matches 99 Fake Address
:
^[\d]+\s*+(-\s*[\d]+)?
If you know there's only going to be 1 white space, you could replace \s*
with \s?
:
^[\d]+\s?(-\s?[\d]+)?
That should match all of them :D
Upvotes: 2
Reputation: 2365
You can use braces {2,3} for 2-3 numbers - but also *+
isn't right.
/^(([0-9]{1,3}\s-\s)?[0-9]{1,3})\s/
I nested the braces so you only want the first result from the regex.
it breaks up like this
([0-9]{1,3}\s-\s)?
first, Is there a 1-3 digit number with a space-dash-space - OPTIONAL
then.. does it end in a 1-3 digit number followed by a space.
Upvotes: 2
Reputation: 12649
I'm not sure what you're trying to do here \s*+
but you basically had the answer with the last part [0-9][0-9]+
that would find 2+ digits on the end.
Maybe try this (it's more concise). This searches for 1+ digits instead of 2+
\d+(\s*-\s*\d+)?
Upvotes: 4