Reputation: 2907
I have a string with space separated addresses and I want to separate the number from the street name.
So if we have :
Street Blah Blah 34
or
34 Street Blah Blah
I want a regex to match the "Street Blah Blah" and another to match "34"
It can get more complex with addresses like this:
Überbrückerstraße 24a.
where it should return "24a" and the rest as a street or
Järnvägstationg. 3/B
where it should return 3/B and the rest as a street etc.
I am currently doing this using C# where I split all strings by space and return whichever string contains at least one number and then return all the rest as a street.
However I was wondering if it would be more elegant and more efficient to do this with Regex.
I've been fiddling with regex but I couldn't find a robust way so far. Any ideas?
Here are some unit test data. Input street, Expected premise number and expected street:
[TestCase("Järvägstationg. 3/B", "3/B", "Järvägstationg.")]
[TestCase("Überbrückerstraße 24a", "24a", "Überbrückerstraße")]
[TestCase("Street Blah Blah 34", "34", "Street Blah Blah")]
[TestCase("34 Street Blah Blah", "34", "Street Blah Blah")]
[TestCase("Ueckerstr. 20 b", "20 b", "Ueckerstr.")]
[TestCase("Elmshornerstraße 163", "163", "Elmshornerstraße")]
[TestCase("Hallgartenerstrasse Moritzstr.", "", "Hallgartenerstrasse Moritzstr.")]
[TestCase("19 Green Lane", "19", "Green Lane")]
I think out of these the
Ueckerstr. 20 b
is the trickiest, in which case, I don't mind if that one fails for now.
Upvotes: 0
Views: 250
Reputation: 29431
If your input strings follow the same format, you can use:
(?<street>.*) (?<number>.*)
See Live demo
Then access it with:
var address = "Überbrückerstraße 24a.";
var m = Regex.Matches(address, @"(?<street>.*) (?<number>.*)");
var street = m[0].Groups["street"].Value;
var streetNumber = m[0].Groups["number"].Value;
Console.WriteLine(string.Format("Street Name: {0}, at {1}", street, streetNumber));
outputs:
Street Name: Überbrückerstraße, at 24a.
See live C#
Given what you provided after, I would use:
^(\d.*?) (.*)|(.*) (\d.*)|(.+)
where:
^(\d.*?) (.*)
matches the string with the number at the beginning;(.*) (\d.*)
matches the string with the number at the end;(.+)
matches the string that doesn't contain numbers. It must stay at the end or it will capture every case.See Demo
Upvotes: 0
Reputation: 47099
@"(?<=^\d[^ ]*) | (?=\d)"
as split might work for you, it will however not work for Hallgartenerstrasse Moritzstr.
since it will put Hallgartenerstrasse Moritzstr.
in match group 0 and not 1:
Test:
using System;
using System.Text.RegularExpressions;
public class Example {
public static void Main() {
string[] inputs = {
"Überbrückerstraße 24a",
"34 Street Blah Blah",
"Hallgartenerstrasse Moritzstr.",
"Ueckerstr. 20 b"
};
foreach (string input in inputs) {
string pat = @"(?<=^\d[^ ]*) | (?=\d)";
string[] matches = Regex.Split(input, pat);
foreach (string match in matches) {
Console.Write("<{0}>", match);
}
Console.Write("\n");
}
}
}
Will output:
<Überbrückerstraße><24a>
<34><Street Blah Blah>
<Hallgartenerstrasse Moritzstr.>
<Ueckerstr.><20 b>
Upvotes: 0
Reputation: 23958
http://www.phpliveregex.com/p/fWT
var matches = Regex.Match(@"(.*)\s(\d+.*)", input);
Upvotes: 1