velokw
velokw

Reputation: 33

NSString substringWithRange returns incorrect substring

I'm working on an OS X app, running XCode 6.4 and Yosemite. Distilling the problem down to a couple lines of code, I'm using substringWithRange to extract a substring and getting a string that's 18 characters long, but I was expecting a string with 26 characters. What am I doing wrong?

//              12345678901234567890123456789
NSString *s = @"ClientÅåÄäÖöÅåÆæØø_Example #2";
NSRange range = NSMakeRange(0, 26);
NSString *result = [s substringWithRange:range];
//              12345678901234567890123456
//              ClientÅåÄäÖöÅåÆæØø

EDIT: I added an NSLog to show only the first 18 characters are output and took a screenshot, but SO says I need 10 reputation points to attach an image. Let's try this: https://i.sstatic.net/A1u5J.jpg. I'm not making this up, the output of NSLog shows 18 characters (as does the window with Locals showing the contents of result).

EDIT: It gets even better. I copied the string constant from above question and pasted it back into my code in a second block. https://i.sstatic.net/ZHcgI.jpg. It seems that even though the two strings s and s2 look identical, they are somehow not the same. How can I figure out what's wrong with the first string constant? The app needs to handle whatever unicode strings are thrown at it.

EDIT: I added some code to check for equality, check lengths, and print each character as follows:

//              12345678901234567890123456789
NSString *s = @"ClientÅåÄäÖöÅåÆæØø_Example #2";    
NSString *s2 = @"ClientÅåÄäÖöÅåÆæØø_Example #2";

NSLog(@"isEqualToString is %d", [s isEqualToString:s2]);

NSLog(@"lengths are %lu\t%lu\n", [s length], [s2 length]);
for(unsigned long n = 0; n < [s length]; n++)
    NSLog(@"%@\t%@\n",
          n < [s length] ? [NSString stringWithFormat:@"%u", [s characterAtIndex:n]] : @"",
          n < [s2 length] ? [NSString stringWithFormat:@"%u", [s2 characterAtIndex:n]] : @"");

Which gives:

isEqualToString is 0
lengths are 37  29
67  67
108 108
105 105
101 101
110 110
116 116
65  197
778 229
97  196
778 228
65  214
776 246
97  197
776 229
79  198
776 230
111 216
776 248
65  95
778 69
97  120
778 97
198 109
230 112
216 108
248 101
95  32
69  35
120 50
97  
109 
112 
108 
101 
32  
35  
50  

Upvotes: 1

Views: 335

Answers (2)

picciano
picciano

Reputation: 22711

This is rather sad, actually. Ranges of an NSString do NOT refer to Unicode code points. Unicode characters count as two characters in this case.

This answer shows how to do it correctly: Berry Blue's answer

Upvotes: 1

gnasher729
gnasher729

Reputation: 52632

What you get isn't always what you see. It is quite possible that you managed to put some of the more "interesting" Unicode characters into your string, for example a zero width non-breaking space character which is totally invisible.

I'd print out the length of the string and characterAtIndex:i for all characters in the string and check what's really in it.

Upvotes: 1

Related Questions