Nick
Nick

Reputation: 2907

Regex match space-separated alphanumeric strings

I have a string with space separated addresses and I want to separate the number from the street name.

So if we have :

Street Blah Blah 34

or

34 Street Blah Blah

I want a regex to match the "Street Blah Blah" and another to match "34"

It can get more complex with addresses like this:

Überbrückerstraße 24a.

where it should return "24a" and the rest as a street or

Järnvägstationg. 3/B

where it should return 3/B and the rest as a street etc.

I am currently doing this using C# where I split all strings by space and return whichever string contains at least one number and then return all the rest as a street.

However I was wondering if it would be more elegant and more efficient to do this with Regex.

I've been fiddling with regex but I couldn't find a robust way so far. Any ideas?

Here are some unit test data. Input street, Expected premise number and expected street:

    [TestCase("Järvägstationg. 3/B", "3/B", "Järvägstationg.")]
    [TestCase("Überbrückerstraße 24a", "24a", "Überbrückerstraße")]
    [TestCase("Street Blah Blah 34", "34", "Street Blah Blah")]
    [TestCase("34 Street Blah Blah", "34", "Street Blah Blah")]
    [TestCase("Ueckerstr. 20 b", "20 b", "Ueckerstr.")]
    [TestCase("Elmshornerstraße 163", "163", "Elmshornerstraße")]
    [TestCase("Hallgartenerstrasse Moritzstr.", "", "Hallgartenerstrasse Moritzstr.")]
    [TestCase("19 Green Lane", "19", "Green Lane")]

I think out of these the

Ueckerstr. 20 b

is the trickiest, in which case, I don't mind if that one fails for now.

Upvotes: 0

Views: 250

Answers (3)

Thomas Ayoub
Thomas Ayoub

Reputation: 29431

If your input strings follow the same format, you can use:

(?<street>.*) (?<number>.*)

See Live demo

Then access it with:

var address = "Überbrückerstraße 24a.";
var m = Regex.Matches(address, @"(?<street>.*) (?<number>.*)");
var street = m[0].Groups["street"].Value;
var streetNumber = m[0].Groups["number"].Value;
Console.WriteLine(string.Format("Street Name: {0}, at {1}", street, streetNumber));

outputs:

Street Name: Überbrückerstraße, at 24a.

See live C#


Given what you provided after, I would use:

^(\d.*?) (.*)|(.*) (\d.*)|(.+)

where:

  • ^(\d.*?) (.*) matches the string with the number at the beginning;
  • (.*) (\d.*) matches the string with the number at the end;
  • (.+) matches the string that doesn't contain numbers. It must stay at the end or it will capture every case.

See Demo

Upvotes: 0

Andreas Louv
Andreas Louv

Reputation: 47099

@"(?<=^\d[^ ]*) | (?=\d)" as split might work for you, it will however not work for Hallgartenerstrasse Moritzstr. since it will put Hallgartenerstrasse Moritzstr. in match group 0 and not 1:

Test:

using System;
using System.Text.RegularExpressions;

public class Example {
    public static void Main() {
        string[] inputs = {
            "Überbrückerstraße 24a",
            "34 Street Blah Blah",
            "Hallgartenerstrasse Moritzstr.",
            "Ueckerstr. 20 b"
        };
        foreach (string input in inputs) {
            string pat = @"(?<=^\d[^ ]*) | (?=\d)";
            string[] matches = Regex.Split(input, pat);
            foreach (string match in matches) {
                Console.Write("<{0}>", match);
            }
            Console.Write("\n");
        }
    }
}

Will output:

<Überbrückerstraße><24a>
<34><Street Blah Blah>
<Hallgartenerstrasse Moritzstr.>
<Ueckerstr.><20 b>

Upvotes: 0

Andreas
Andreas

Reputation: 23958

http://www.phpliveregex.com/p/fWT

 var matches = Regex.Match(@"(.*)\s(\d+.*)", input);

Upvotes: 1

Related Questions