Reputation: 1018
There are tons of posts regarding how to capitalize the first letter with C#, but I specifically am struggling how to do this when ignoring prefixed non-letter characters and tags inside them. Eg,
<style=blah>capitalize the word, 'capitalize'</style>
How to ignore potential <> tags (or non-letter chars before it, like asterisk *
) and the contents within them, THEN capitalize "capitalize"?
I tried:
public static string CapitalizeFirstCharToUpperRegex(string str)
{
// Check for empty string.
if (string.IsNullOrEmpty(str))
return string.Empty;
// Return char and concat substring.
// Start @ first char, no matter what (avoid <tags>, etc)
string pattern = @"(^.*?)([a-z])(.+)";
// Extract middle, then upper 1st char
string middleUpperFirst = Regex.Replace(str, pattern, "$2");
middleUpperFirst = CapitalizeFirstCharToUpper(str); // Works
// Inject the middle back in
string final = $"$1{middleUpperFirst}$3";
return Regex.Replace(str, pattern, final);
}
EDIT:
Input: <style=foo>first non-tagged word 1st char upper</style>
Expected output: <style=foo>First non-tagged word 1st char upper</style>
Upvotes: 1
Views: 1781
Reputation: 626845
You may use
<[^<>]*>|(?<!\p{L})(\p{L})(\p{L}*)
The regex does the following:
<[^<>]*>
- matches <
, any 0+ chars other than <
and >
and then >
|
- or(?<!\p{L})
- finds a position not immediately preceded with a letter(\p{L})
- captures into Group 1 any letter(\p{L}*)
- captures into Group 2 any 0+ letters (that is necessary if you want to lowercase the rest of the word).Then, check if Group 2 matched, and if yes, capitalize the first group value and lowercase the second one, else, return the whole value:
var result = Regex.Replace(s, @"<[^<>]*>|(?<!\p{L})(\p{L})(\p{L}*)", m =>
m.Groups[1].Success ?
m.Groups[1].Value.ToUpper() + m.Groups[2].Value.ToLower() :
m.Value);
If you do not need to lowercase the rest of the word, remove the second group and the code related to it:
var result = Regex.Replace(s, @"<[^<>]*>|(?<!\p{L})(\p{L})", m =>
m.Groups[1].Success ?
m.Groups[1].Value.ToUpper() : m.Value);
To only replace the first occurrence using this approach, you need to set a flag and reverse it once the first match is found:
var s = "<style=foo>first non-tagged word 1st char upper</style>";
var found = false;
var result = Regex.Replace(s, @"<[^<>]*>|(?<!\p{L})(\p{L})", m => {
if (m.Groups[1].Success && !found) {
found = !found;
return m.Groups[1].Value.ToUpper();
} else {
return m.Value;
}
});
Console.WriteLine(result); // => <style=foo>First non-tagged word 1st char upper</style>
See the C# demo.
Upvotes: 1
Reputation: 7973
Using look-behind regex feature you can match the first 'capitalize' without > parenthesis and then you can capitalize the output.
The regex is the following:
(?<=<.*>)\w+
It will match the first word after the > parenthesis
Upvotes: 1