Matt Evans
Matt Evans

Reputation: 7575

Regex to split number and string except for ordinal indicators

Looking for a Regex to introduce a space where numbers and strings are concatenated in user input, except for the ordinal indicators e.g. 1st, 11th, 22nd, 33rd, 44th etc.

So this string:

Hi is this available 18dec to 21st dec

is returned as

Hi is this available 18 dec to 21st dec

Using this expression

 Regex.Replace(value, @"(\d)(\p{L})", "$1 $2"))

gives

Hi is this available 18 dec to 21 st dec

EDIT:

As per the comment from @juharr dec12th should be changed to dec 12th

Upvotes: 3

Views: 388

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626774

You may use a solution like:

var s = "Hi is this available 18dec to 21st dec 2nd dec 1st jan dec12th";
var res = Regex.Replace(s, @"(\p{L})?(\d+)(st|[nr]d|th|(\p{L}+))", repl);
Console.WriteLine(res);
// => Hi is this available 18 dec to 21st dec 2nd dec 1st jan dec 12th

// This is the callback method that does all the work
public static string repl(Match m) 
{
    var res = new StringBuilder();
    res.Append(m.Groups[1].Value);  // Add what was matched in Group 1
    if (m.Groups[1].Success)        // If it matched at all...
        res.Append(" ");            // Append a space to separate word from number
    res.Append(m.Groups[2].Value);  // Add Group 2 value (number)
    if (m.Groups[4].Success)        // If there is a word (not st/th/rd/nd suffix)...
        res.Append(" ");            // Add a space to separate the number from the word
    res.Append(m.Groups[3]);         // Add what was captured in Group 3
    return res.ToString();
}

See the C# demo.

The regex used is

(\p{L})?(\d+)(st|[nr]d|th|(\p{L}+))

See its demo online. It matches:

  • (\p{L})? - an optional Group 1 matching a single letter
  • (\d+) - Group 2: one or more digits
  • (st|[nr]d|th|(\p{L}+)) - Group 3 matching the following alternatives
    • st - st
    • [nr]d - nd or rd
    • th - th
    • (\p{L}+) - Group 4: any 1 or more Unicode letters

The repl callback method takes the match object and uses additional logic to build the correct replacement string based on whether the optional groups matched or not.

Pass the RegexOptions.IgnoreCase option if you need a case insensitive search and replace, and RegexOptions.ECMAScript if you only want to match ASCII digits with \d (note that \p{L} will still match any Unicode letter even if you pass this option to the regex).

Upvotes: 3

Related Questions