Reputation: 7575
Looking for a Regex to introduce a space where numbers and strings are concatenated in user input, except for the ordinal indicators e.g. 1st, 11th, 22nd, 33rd, 44th etc.
So this string:
Hi is this available 18dec to 21st dec
is returned as
Hi is this available 18 dec to 21st dec
Using this expression
Regex.Replace(value, @"(\d)(\p{L})", "$1 $2"))
gives
Hi is this available 18 dec to 21 st dec
EDIT:
As per the comment from @juharr dec12th should be changed to dec 12th
Upvotes: 3
Views: 388
Reputation: 626774
You may use a solution like:
var s = "Hi is this available 18dec to 21st dec 2nd dec 1st jan dec12th";
var res = Regex.Replace(s, @"(\p{L})?(\d+)(st|[nr]d|th|(\p{L}+))", repl);
Console.WriteLine(res);
// => Hi is this available 18 dec to 21st dec 2nd dec 1st jan dec 12th
// This is the callback method that does all the work
public static string repl(Match m)
{
var res = new StringBuilder();
res.Append(m.Groups[1].Value); // Add what was matched in Group 1
if (m.Groups[1].Success) // If it matched at all...
res.Append(" "); // Append a space to separate word from number
res.Append(m.Groups[2].Value); // Add Group 2 value (number)
if (m.Groups[4].Success) // If there is a word (not st/th/rd/nd suffix)...
res.Append(" "); // Add a space to separate the number from the word
res.Append(m.Groups[3]); // Add what was captured in Group 3
return res.ToString();
}
See the C# demo.
The regex used is
(\p{L})?(\d+)(st|[nr]d|th|(\p{L}+))
See its demo online. It matches:
(\p{L})?
- an optional Group 1 matching a single letter(\d+)
- Group 2: one or more digits(st|[nr]d|th|(\p{L}+))
- Group 3 matching the following alternatives
st
- st
[nr]d
- nd
or rd
th
- th
(\p{L}+)
- Group 4: any 1 or more Unicode lettersThe repl
callback method takes the match object and uses additional logic to build the correct replacement string based on whether the optional groups matched or not.
Pass the RegexOptions.IgnoreCase
option if you need a case insensitive search and replace, and RegexOptions.ECMAScript
if you only want to match ASCII digits with \d
(note that \p{L}
will still match any Unicode letter even if you pass this option to the regex).
Upvotes: 3