Reputation: 598
I want to split camelCase
or PascalCase
words to space separate collection of words.
So far, I have:
Regex.Replace(value, @"(\B[A-Z]+?(?=[A-Z][^A-Z])|\B[A-Z]+?(?=[^A-Z]))", " $0", RegexOptions.Compiled);
It works fine for converting "TestWord" to "Test Word" and for leaving single words untouched, e.g. Testing
remains Testing
.
However, ABCTest
gets converted to A B C Test
when I would prefer ABC Test
.
Upvotes: 8
Views: 494
Reputation: 627380
Here is my attempt:
(?<!^|\b|\p{Lu})\p{Lu}+(?=\p{Ll}|\b)|(?<!^\p{Lu}*|\b)\p{Lu}(?=\p{Ll}|(?<!\p{Lu}*)\b)
This regex can be used with Regex.Replace
and $0
as a replacement string.
Regex.Replace(value, @"(?<!^|\b|\p{Lu})\p{Lu}+(?=\p{Ll}|\b)|(?<!^\p{Lu}*|\b)\p{Lu}(?=\p{Ll}|(?<!\p{Lu}*)\b)", " $0", RegexOptions.Compiled);
See demo
Regex Explanation:
(?<!^|\b|\p{Lu})\p{Lu}+(?=\p{Ll}|\b)
- first alternative that matches several uppercase letters that are not preceded with a start of string, word boundary or another uppercase letter, and that are followed by a lowercase letter or a word boundary, (?<!^\p{Lu}*|\b)\p{Lu}(?=\p{Ll}|(?<!\p{Lu}*)\b)
- the second alternative that matches a single capital letter that is not preceded with a start of string with optional uppercase letters right after, or word boundary and is followed by a lowercase letter or a word boundary that is not preceded by optional uppercase letters.Upvotes: 1
Reputation: 9455
Do you have a requirement to use Regex? To be honest, I wouldn't use Regex for this at all. They're hard to debug and not especially readable.
I would go with a small, reusable, easily testable extension method:
class Program
{
static void Main(string[] args)
{
string[] inputs = new[]
{
"ABCTest",
"HelloWorld",
"testTest$Test",
"aaҚbb"
};
var output = inputs.Select(x => x.SplitWithSpaces(CultureInfo.CurrentUICulture));
foreach (string x in output)
{
Console.WriteLine(x);
}
Console.Read();
}
}
public static class StringExtensions
{
public static bool IsLowerCase(this TextInfo textInfo, char input)
{
return textInfo.ToLower(input) == input;
}
public static string SplitWithSpaces(this string input, CultureInfo culture = null)
{
if (culture == null)
{
culture = CultureInfo.InvariantCulture;
}
TextInfo textInfo = culture.TextInfo;
StringBuilder sb = new StringBuilder(input);
for (int i = 1; i < sb.Length; i++)
{
int previous = i - 1;
if (textInfo.IsLowerCase(sb[previous]))
{
int insertLocation = previous - 1;
if (insertLocation > 0)
{
sb.Insert(insertLocation, ' ');
}
while (i < sb.Length && textInfo.IsLowerCase(sb[i]))
{
i++;
}
}
}
return sb.ToString();
}
}
Upvotes: 0
Reputation: 2269
Try:
[A-Z][a-z]+|[A-Z]+(?=[A-Z][a-z])|[a-z]+|[A-Z]+
string strText = " TestWord asdfDasdf ABCDef";
string[] matches = Regex.Matches(strText, @"[A-Z][a-z]+|[A-Z]+(?=[A-Z][a-z])|[a-z]+|[A-Z]+")
.Cast<Match>()
.Select(m => m.Value)
.ToArray();
string result = String.Join(" ", matches);
result
= 'Test Word asdf Dasdf ABC Def'
In the example string:
TestWord qwerDasdf
ABCTest Testing ((*&^%$CamelCase!"£$%^^))
asdfAasdf
AaBbbCD
[A-Z][a-z]+
matches:
Test
Word
Dasdf
Test
Testing
Camel
Case
Aasdf
Aa
Bbb
[A-Z]+(?=[A-Z][a-z])
matches:
ABC
[a-z]+
matches:
qwer
asdf
[A-Z]+
matches:
CD
Upvotes: 4