Slee
Slee

Reputation: 28248

remove non ASCII characters from NSString in objective-c

I have an app that syncs data from a remote DB that users populate. Seems people copy and paste crap from a ton of different OS's and programs which can cause different hidden non ASCII values to be imported into the system.

For example I end up with this:

Artist:â â Ioco

This ends up getting sent back into system during sync and my JSON conversion furthers the problem and invalid characters in various places cause my app to crash.

How do I search for and clean out any of these invalid characters?

Upvotes: 6

Views: 10835

Answers (2)

NSGod
NSGod

Reputation: 22948

A simpler version of Morten Fast's answer:

NSString *test = @"Olé, señor!";

NSCharacterSet *nonAsciiCharacterSet = [[NSCharacterSet 
           characterSetWithRange:NSMakeRange(32, 127 - 32)] invertedSet];

test = [[test componentsSeparatedByCharactersInSet:nonAsciiCharacterSet] 
                          componentsJoinedByString:@""];

NSLog(@"%@", test); // Prints @"Ol, seor!"

Notably, this uses NSCharacterSet's +characterSetWithRange: method to simply specify the desired ASCII range rather than having to create a string, etc.

The results are identical, as comparing one to the other with isEqual: returns YES.

Upvotes: 1

Morten Fast
Morten Fast

Reputation: 6320

While I strongly believe that supporting unicode is the right way to go, here's an example of how you can limit a string to only contain certain characters (in this case ASCII):

NSString *test = @"Olé, señor!";

NSMutableString *asciiCharacters = [NSMutableString string];
for (NSInteger i = 32; i < 127; i++)  {
    [asciiCharacters appendFormat:@"%c", i];
}

NSCharacterSet *nonAsciiCharacterSet = [[NSCharacterSet characterSetWithCharactersInString:asciiCharacters] invertedSet];

test = [[test componentsSeparatedByCharactersInSet:nonAsciiCharacterSet] componentsJoinedByString:@""];

NSLog(@"%@", test); // Prints @"Ol, seor!"

Upvotes: 22

Related Questions