Lill Lansey
Lill Lansey

Reputation: 4915

In C# what is the difference between ToUpper() and ToUpperInvariant()?

In C#, what is the difference between ToUpper() and ToUpperInvariant()?

Can you give an example where the results might be different?

Upvotes: 164

Views: 84363

Answers (6)

taylonr
taylonr

Reputation: 10790

ToUpperInvariant uses the rules from the invariant culture

Upvotes: 3

krzychu
krzychu

Reputation: 3687

String.ToUpper and String.ToLower can give different results given different cultures. The most known example is the Turkish example, for which converting lowercase latin "i" to uppercase, doesn't result in a capitalized latin "I", but in the Turkish "I".

Capitalization of I depending on culture, upper row - lower case letters, lower row - upper case letters

As for me it was confusing even with the above picture (source), I wrote a program (see source code below) to see the exact output for the Turkish example:

# Lowercase letters
Character              | UpperInvariant | UpperTurkish | LowerInvariant | LowerTurkish
English i - i (\u0069) | I (\u0049)     | I (\u0130)   | i (\u0069)     | i (\u0069)
Turkish i - ı (\u0131) | ı (\u0131)     | I (\u0049)   | ı (\u0131)     | ı (\u0131)

# Uppercase letters
Character              | UpperInvariant | UpperTurkish | LowerInvariant | LowerTurkish
English i - I (\u0049) | I (\u0049)     | I (\u0049)   | i (\u0069)     | ı (\u0131)
Turkish i - I (\u0130) | I (\u0130)     | I (\u0130)   | I (\u0130)     | i (\u0069)

As you can see:

  1. Uppercasing lower case letters and lowercasing upper case letters give different results for invariant culture and Turkish culture.
  2. Uppercasing upper case letters and lowercasing lower case letters has no effect, no matter what the culture is.
  3. Culture.CultureInvariant leaves the Turkish characters as is
  4. ToUpper and ToLower are reversible, that is lowercasing a character after uppercasing it, brings it to the original form, as long as for both operations the same culture was used.

According to MSDN, for Char.ToUpper and Char.ToLower Turkish and Azeri are the only affected cultures because they are the only ones with single-character casing differences. For strings, there might be more cultures affected.


Source code of a console application used to generate the output:

using System;
using System.Globalization;
using System.Linq;
using System.Text;

namespace TurkishI
{
    class Program
    {
        static void Main(string[] args)
        {
            var englishI = new UnicodeCharacter('\u0069', "English i");
            var turkishI = new UnicodeCharacter('\u0131', "Turkish i");

            Console.WriteLine("# Lowercase letters");
            Console.WriteLine("Character              | UpperInvariant | UpperTurkish | LowerInvariant | LowerTurkish");
            WriteUpperToConsole(englishI);
            WriteLowerToConsole(turkishI);

            Console.WriteLine("\n# Uppercase letters");
            var uppercaseEnglishI = new UnicodeCharacter('\u0049', "English i");
            var uppercaseTurkishI = new UnicodeCharacter('\u0130', "Turkish i");
            Console.WriteLine("Character              | UpperInvariant | UpperTurkish | LowerInvariant | LowerTurkish");
            WriteLowerToConsole(uppercaseEnglishI);
            WriteLowerToConsole(uppercaseTurkishI);

            Console.ReadKey();
        }

        static void WriteUpperToConsole(UnicodeCharacter character)
        {
            Console.WriteLine("{0,-9} - {1,10} | {2,-14} | {3,-12} | {4,-14} | {5,-12}",
                character.Description,
                character,
                character.UpperInvariant,
                character.UpperTurkish,
                character.LowerInvariant,
                character.LowerTurkish
            );
        }

        static void WriteLowerToConsole(UnicodeCharacter character)
        {
            Console.WriteLine("{0,-9} - {1,10} | {2,-14} | {3,-12} | {4,-14} | {5,-12}",
                character.Description,
                character,
                character.UpperInvariant,
                character.UpperTurkish,
                character.LowerInvariant,
                character.LowerTurkish
            );
        }
    }


    class UnicodeCharacter
    {
        public static readonly CultureInfo TurkishCulture = new CultureInfo("tr-TR");

        public char Character { get; }

        public string Description { get; }

        public UnicodeCharacter(char character) : this(character, string.Empty) {  }

        public UnicodeCharacter(char character, string description)
        {
            if (description == null) {
                throw new ArgumentNullException(nameof(description));
            }

            Character = character;
            Description = description;
        }

        public string EscapeSequence => ToUnicodeEscapeSequence(Character);

        public UnicodeCharacter LowerInvariant => new UnicodeCharacter(Char.ToLowerInvariant(Character));

        public UnicodeCharacter UpperInvariant => new UnicodeCharacter(Char.ToUpperInvariant(Character));

        public UnicodeCharacter LowerTurkish => new UnicodeCharacter(Char.ToLower(Character, TurkishCulture));

        public UnicodeCharacter UpperTurkish => new UnicodeCharacter(Char.ToUpper(Character, TurkishCulture));


        private static string ToUnicodeEscapeSequence(char character)
        {
            var bytes = Encoding.Unicode.GetBytes(new[] {character});
            var prefix = bytes.Length == 4 ? @"\U" : @"\u";
            var hex = BitConverter.ToString(bytes.Reverse().ToArray()).Replace("-", string.Empty);
            return $"{prefix}{hex}";
        }

        public override string ToString()
        {
            return $"{Character} ({EscapeSequence})";
        }
    }
}

Upvotes: 28

Tergiver
Tergiver

Reputation: 14517

Jon's answer is perfect. I just wanted to add that ToUpperInvariant is the same as calling ToUpper(CultureInfo.InvariantCulture).

That makes Jon's example a little simpler:

using System;
using System.Drawing;
using System.Globalization;
using System.Threading;
using System.Windows.Forms;

public class Test
{
    [STAThread]
    static void Main()
    {
        string invariant = "iii".ToUpper(CultureInfo.InvariantCulture);
        string cultured = "iii".ToUpper(new CultureInfo("tr-TR"));

        Application.Run(new Form {
            Font = new Font("Times New Roman", 40),
            Controls = { 
                new Label { Text = invariant, Location = new Point(20, 20), AutoSize = true }, 
                new Label { Text = cultured, Location = new Point(20, 100), AutoSize = true }, 
            }
        });
    }
}

I also used New Times Roman because it's a cooler font.

I also set the Form's Font property instead of the two Label controls because the Font property is inherited.

And I reduced a few other lines just because I like compact (example, not production) code.

I really had nothing better to do at the moment.

Upvotes: 31

Jon Skeet
Jon Skeet

Reputation: 1500795

ToUpper uses the current culture. ToUpperInvariant uses the invariant culture.

The canonical example is Turkey, where the upper case of "i" isn't "I".

Sample code showing the difference:

using System;
using System.Drawing;
using System.Globalization;
using System.Threading;
using System.Windows.Forms;

public class Test
{
    [STAThread]
    static void Main()
    {
        string invariant = "iii".ToUpperInvariant();
        CultureInfo turkey = new CultureInfo("tr-TR");
        Thread.CurrentThread.CurrentCulture = turkey;
        string cultured = "iii".ToUpper();

        Font bigFont = new Font("Arial", 40);
        Form f = new Form {
            Controls = {
                new Label { Text = invariant, Location = new Point(20, 20),
                            Font = bigFont, AutoSize = true},
                new Label { Text = cultured, Location = new Point(20, 100),
                            Font = bigFont, AutoSize = true }
            }
        };        
        Application.Run(f);
    }
}

For more on Turkish, see this Turkey Test blog post.

I wouldn't be surprised to hear that there are various other capitalization issues around elided characters etc. This is just one example I know off the top of my head... partly because it bit me years ago in Java, where I was upper-casing a string and comparing it with "MAIL". That didn't work so well in Turkey...

Upvotes: 183

Stefanvds
Stefanvds

Reputation: 5916

there is no difference in english. only in turkish culture a difference can be found.

Upvotes: 0

CaffGeek
CaffGeek

Reputation: 22054

Start with MSDN

http://msdn.microsoft.com/en-us/library/system.string.toupperinvariant.aspx

The ToUpperInvariant method is equivalent to ToUpper(CultureInfo.InvariantCulture)

Just because a capital i is 'I' in English, doesn't always make it so.

Upvotes: 16

Related Questions