vpv
vpv

Reputation: 938

Why "White Space" not detected in a string in C#?

I am getting very confused having this error. Below are the 2 strings I am comparing and they look exactly same in the open eyes. But when I tried to compare them in C# Code OR MS Excel, the result is "Mismatch".

1st: Frillestads_församling_Länsräkenskaper efter 1917. Mantalslängder 1918-1991 Special 99

2nd: Frillestads_församling_Länsräkenskaper efter 1917. Mantalslängder 1918-1991 Special 99  

Even when I tried to split them in a string array using single space (' '), the 1st line wasn't splitted.

Here is the C# code:

    private void btnFindMismatch_Click(object sender, EventArgs e)
    {
        string value1 = FormattedString(txtFirstValue.Text);
        string value2 = FormattedString(txtSecondValue.Text);

        bool isMismatchFound = false;

        string[] value1Array = value1.Split(new char[] { ' ' }, StringSplitOptions.RemoveEmptyEntries);
        string[] value2Array = value2.Split(new char[] { ' ' }, StringSplitOptions.RemoveEmptyEntries);

        for (int i = 0; i < value1Array.Length; i++)
        {
            if(value1Array[i].Equals(value2Array[i]) == false)
            {
                lblResult.Text = string.Format("Mismatch in index: {0}; 1st Char: {1}; 2nd Char: {2}", i, value1Array[i], value2Array[i]);
                isMismatchFound = true;

                break;
            }
        }

        if(!isMismatchFound)
        {
            lblResult.Text = "No Mismatch";
        }

        MessageBox.Show("Complete");
    }

    private string FormattedString(string value)
    {
        RegexOptions options = RegexOptions.None;
        Regex regex = new Regex(@"[ ]{2,}", options);
        value = regex.Replace(value, @" ");

        return value;
    }  

I then tried to check the 1st value in notepad++ and then found that, the 1st string did not contain any "White Space".

Please see below screen shots for more clearer view.

C# Code output

Notepad++ Search output

Upvotes: 1

Views: 1924

Answers (1)

Tim Pietzcker
Tim Pietzcker

Reputation: 336108

It appears that those aren't normal spaces (0x20) but perhaps non-breakable spaces (0xA0). If you use the universal whitespace shorthand \s instead of a standard space character, it should work.

Regex regex = new Regex(@"\s{2,}", options); // for example

Note that \s will also match newlines, tabs and other whitespace - so perhaps you want to make the regex more specific, depending on which space character is actually being used (Notepad++ probably has a hexadecimal mode that will allow you to check which one it is exactly):

Regex regex = new Regex(@"[ \xa0]{2,}", options);

Upvotes: 2

Related Questions