Reputation: 38335
Given the following sample strings:
PP12111 LOREM IPSUM TM ENCORE
LOREM PP12111 IPSUM TM ENCORE
LOREM IPSUM ENCORE TM PP12111
LOREM PP12111 PP12111 TM ENCORE
What would be a .NET RegEx to set title case and then convert any string containing numbers and letters to upper case (see note below):
PP12111 Lorem Ipsum TM Encore
Lorem PP12111 Ipsum TM Encore
Lorem Ipsum Encore TM PP12111
Lorem PP12111 PP12111 TM Encore
Alternativley, I can start with everything set to Title Case so only the strings containing numbers and letters need to be set to upper case:
Pp12111 Lorem Ipsum TM Encore
Lorem Pp12111 Ipsum TM Encore
Lorem Ipsum Encore TM Pp12111
Lorem Pp12111 Pp12111 TM Encore
Note: if any variant of TM exists (tm, Tm, tM), the it should be full upper case. Where the TM could be "lorem ipsum TM valor" or "lorem ipsum (TM) valor".
Here is a pure string manipulation method that works; I would think that a RegEx solution may be a better fit?
private static void Main( string[] args )
{
var phrases = new[]
{
"PP12111 LOREM IPSUM TM ENCORE", "LOREM PP12111 IPSUM TM ENCORE",
"LOREM IPSUM ENCORE TM PP12111", "LOREM PP12111 PP12111 TM ENCORE",
};
Test(phrases);
}
private static void Test( IList<string> phrases )
{
var ti = Thread.CurrentThread.CurrentCulture.TextInfo;
for( int i = 0; i < phrases.Count; i++ )
{
string p = ti.ToTitleCase( phrases[i].ToLower() );
string[] words = p.Split( ' ' );
for( int j = 0; j < words.Length; j++ )
{
string word = words[j];
if( word.ToCharArray().Any( Char.IsNumber ) )
{
word = word.ToUpper();
}
words[j] = word.Replace( " Tm ", " TM " ).Replace( "(Tm)", "(TM)" );
}
phrases[i] = string.Join( " ", words );
Console.WriteLine( phrases[i] );
}
}
Upvotes: 1
Views: 3434
Reputation: 112299
You can use this regex like this:
MatchEvaluator evaluator = m => ti.ToTitleCase(m.Value.ToLower());
string result = Regex.Replace(input, @"\b(?!TM\b)[A-Z']+\b", evaluator,
RegexOptions.IgnoreCase);
\b
Is a word boundary.
pos(?!suffix)
Matches position not preceeding suffix.
\b(?!TM\b)
Word boundary not preceeding TM
[A-Z]+
Words without digits.
Together: Word boundary not preceeding "TM" followed by words with letters A through Z and word boundary.
UPDATE #1
Upper casing "tm", "Tm", "tM":
I don't know if everything not capitalized can be upper case. In that case the easiest solution would be to upper case the input: input.ToUpper()
. Otherwise execute a second regex replace:
string result = Regex.Replace(result, @"\btm\b", "TM", RegexOptions.IgnoreCase);
UPDATE #2
If you want to upper case several words, you can just use another match evaluator:
MatchEvaluator toUpperCase = m => m.Value.ToUpper();
string result = Regex.Replace(result, @"\b(tm|xxx|yyy)\b", toUpperCase,
RegexOptions.IgnoreCase);
tm|xxx|yyy
specifies the words to be upper cased ("tm", "xxx" or "yyy").
Upvotes: 2
Reputation: 1813
Here's a previously asked close match for you to review: Regular Expression Uppercase Replacement in C#. A regular expression won't be enough here, you'll have to write a MatchEvaluator function to get everything into uppercase.
edit: Seeing "Note: if any variant of TM exists (tm, Tm, tM), the it should be full upper case. Where the TM could be "lorem ipsum TM valor" or "lorem ipsum (TM) valor"." makes me think you should stop considering a regex altogether. What about ex. oatmeal, stuntmen or etc, etc, etc.
Yes, you probably can write one regex that will find all of the cases, or a good and thorough matchevaluator that will take your logic into account. However, you're describing the problem in terms that make me think you're unfamiliar with regular expressions. So it's hard for me to think this is a good answer for you and would be a "stunty" solution rather than anything that should go into production.
Upvotes: 0
Reputation: 20889
First: LowerCase Everything.
Second: Split sentence into words.
For each word:
Check, if there are just two letters or letters and numbers ([a-z]{2}|[a-z0-9]{2,})
Match -> UpperCase it.
No Match-> TitleCase it.
Upvotes: -1