kipopoy
kipopoy

Reputation: 23

How to extract the first 3 free standing characters from a string?

I have a program that needs to parse town names. Sometimes the user enters the correct town name but often the users enter the post code as the town name.

In case I cannot match the town name with a valid town name, I am assuming that the input contains the post code. The first 3 free standing characters of the post code uniquely identify the town.

Post codes have this format 3 letters followed by 3 digits, e.g. ABC123.

However some users enter the digits before the letters and some users combine the town name and the post code, e.g.

123ABC
Pretty city ABC123

How do I extract the first 3 free standing characters?

Free standing = to the left and right of the 3 characters are no other characters.

For the below strings ABC are the first 3 free standing characters.

ABC123
123ABC
ABC 123
123 ABC
123 ABC 456
ABC12DEF
123 ABC DEF
DE 123 ABC
Pretty city ABC123

These next strings do not have 3 free standing characters.

123ABCDEF
ABCD123
123ABCD
123 ABCD
Somename1234
1234Somename

Case is irrelevant.

Here are my attempts

Using regex. Does not work for "Pretty City ABC123"

    Regex rgx = new Regex("[a-zA-Z]{3}");
    string hamster = "ABC123";
    var code = rgx.Match(hamster);

Awkward function

private static string GetCode(string pig)
{
  var code = "";
  var canstart = true;
  for (int i = 0; i < pig.Length; i++)
  {
    //Console.WriteLine(code);
    if (code.Length == 3)
    {
      if (char.IsLetter(pig[i]))
      {
        canstart = false;
        code = "";
      }
      else
      {
        break;
      }
    }
    if (char.IsLetter(pig[i]) && canstart)
    {
      code += pig[i];
    }
    else if (!char.IsLetter(pig[i]) && !canstart)
    {
      canstart = true;
    }
  }

  if (code.Length != 3)
  {
    code = "";
  }
  return code;
}

Upvotes: 2

Views: 306

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627607

You can use

(?<![a-zA-Z])[a-zA-Z]{3}(?![a-zA-Z])

See the regex demo. Details:

  • (?<![a-zA-Z]) - a negative lookbehind that matches a location that is not immediately preceded with an ASCII letter
  • [a-zA-Z]{3} - three ASCII letters
  • (?![a-zA-Z]) - a negative lookahead that matches a location that is not immediately followed with an ASCII letter.

In C#:

var rgx = new Regex(@"(?<![a-zA-Z])[a-zA-Z]{3}(?![a-zA-Z])");
var hamster = "ABC123";
var code = rgx.Match(hamster)?.Value;

Upvotes: 2

Related Questions