Ergibt Sinn
Ergibt Sinn

Reputation: 181

C# string.IndexOf() returns unexpected value

This question applies to C#, .net Compact Framework 2 and Windows CE 5 devices.

I encountered a bug in a .net DLL which was in use on very different CE devices for years, without showing any problems. Suddenly, on a new Windows CE 5.0 device, this bug appeared in the following code:

string s = "Print revenue receipt"; // has only single space chars 
int i = s.IndexOf("  "); // two space chars

I expect i to be -1, however this was only true until today, when indexOf suddenly returned 5.

Since this behaviour doesn't occur when using

int i = s.IndexOf("  ", StringComparison.Ordinal);

, I'm quite sure that this is a culture based phenomenom, but I can't recognize the difference this new device makes. It is a mostly identical version of a known device (just a faster cpu and new board).

Both devices:

The new device had the CF 3.5 preinstalled, whose GAC files I experimentally renamed, with no change in the described behaviour. Since at runtime always Version 2.0.7045.0 is reported, I assume these assemblies have no effect.

Although this is not difficult to fix, i can not stand it when things seem that magical. Any hints what i was missing?

Edit: it is getting stranger and stranger, see screenshot: screenshot

One more: screenshot

Upvotes: 18

Views: 5180

Answers (3)

wborgsm
wborgsm

Reputation: 186

The reference at http://msdn.microsoft.com/en-us/library/k8b1470s.aspx states:

"Character sets include ignorable characters, which are characters that are not considered when performing a linguistic or culture-sensitive comparison. In a culture-sensitive search, if value contains an ignorable character, the result is equivalent to searching with that character removed."

This is from 4.5 reference, references from previous versions don't contain nothing like that.

So let me take a guess: they have changed the rules from 4.0 to 4.5 and now the second space of a two space sequence is considered to be a "ignorable character" - at least if the engine recognizes your string as english text (like in your example string s), otherwise not.

And somehow on your new device, a 4.5 dll is used instead of the expected 2.0 dll.

A wild guess, I know :)

Upvotes: 0

Eric Beaulieu
Eric Beaulieu

Reputation: 1064

I believe you already have the answer using an ordinal search

    int i = s.IndexOf("  ", StringComparison.Ordinal);

You can read a small section in the documentation for the String Class which has this to say on the subject:

String search methods, such as String.StartsWith and String.IndexOf, also can perform culture-sensitive or ordinal string comparisons. The following example illustrates the differences between ordinal and culture-sensitive comparisons using the IndexOf method. A culture-sensitive search in which the current culture is English (United States) considers the substring "oe" to match the ligature "œ". Because a soft hyphen (U+00AD) is a zero-width character, the search treats the soft hyphen as equivalent to Empty and finds a match at the beginning of the string. An ordinal search, on the other hand, does not find a match in either case.

Upvotes: 4

Jan Dörrenhaus
Jan Dörrenhaus

Reputation: 6717

Culture stuff can really appear to be quite magical on some systems. What I came to always do after years of pain is always set the culture information manually to InvariantCulture where I do not explicitly want different behaviour for different cultures. So my suggestion would be: Make that IndexOf check always use the same culture information, like so:

int i = s.IndexOf("  ", StringComparison.InvariantCulture);

Upvotes: 0

Related Questions