NullPointerException
NullPointerException

Reputation: 37689

Problem parsing Strings with Russian chars

I'm using an old objectiveC routine (let's call it oldObjectiveCFunction), which parses a String analyzing each char. After analyzing chars, it divides that String into Strings, and returns them into an array called *functions. This is a super reduced sample of how is that old function doing the String parse:

NSMutableArray *functions = [NSMutableArray new];
NSMutableArray *components = [NSMutableArray new];
NSMutableString *sb = [NSMutableString new];
char c;
int sourceLen = source.length;
int index = 0;

while (index < sourceLen) {
    c = [source characterAtIndex:index];
    //here do some random work analyzing the char 
    [sb appendString:[NSString stringWithFormat:@"%c",c]];
    if (some condition){
        [components addObject:(NSString *)sb];                 
        sb = [NSMutableString new];
        [functions addObject:[components copy]];
    }
}

later, I'm getting each String of *functions doing this with Swift code:

let functions = oldObjectiveCFunction(string) as? [[String]]
functions?.forEach({ (function) in
    var functionCopy = function.map { $0 }
    for index in 0..<functionCopy.count {
       let string = functionCopy[index]
    }
}

the problem is that, it works perfectly with normal strings, but if the String contains russian names, like this:

РАЦИОН

the output, the content of my let string variable, is this:

 \u{10}&\u{18}\u{1e}\u{1d}

How can I get the same Russian string instead of that?

I tried doing this:

let string2 = String(describing: string?.cString(using: String.Encoding.utf8))

but it returns even more strange result:

"Optional([32, 16, 38, 24, 30, 29, 0])" 

Upvotes: 0

Views: 193

Answers (2)

JosefZ
JosefZ

Reputation: 30153

Analysis. Sorry, I don't speak swift or Objective-C so the following example is given in Python; however, the 4th and 5th column (unicode reduced to 8-bit) recalls weird numbers in your question.

for ch in 'РАЦИОН':
   print(ch,                          # character itself
      ord(ch),                        # character unicode in decimal
      '{:04x}'.format(ord(ch)),       # character unicode in hexadecimal
      (ord(ch)&0xFF),                 # unicode reduced to 8-bit decimal
      '{:02x}'.format(ord(ch)&0xFF))  # unicode reduced to 8-bit hexadecimal
Р 1056 0420 32 20
А 1040 0410 16 10
Ц 1062 0426 38 26
И 1048 0418 24 18
О 1054 041e 30 1e
Н 1053 041d 29 1d

Solution. Hence, you need to fix all in your code reducing 16-bit to to 8-bit:
first, declare unichar c; instead of char c; at the 4th line,
and use [sb appendString:[NSString stringWithFormat:@"%C",c]]; at the 11th line; note

  • Latin Capital Letter C in %C specifier 16-bit UTF-16 code unit (unichar) instead of
  • Latin Small Letter C in %c specifier 8-bit unsigned character (unsigned char);

Resources. My answer is based on answers to the following questions at SO:

Upvotes: 1

Moose
Moose

Reputation: 2737

Your last result is not strange. The optional comes from the string?, and the cString() function returns an array of CChar ( Int8 ).

I think the problem comes from here - but I'm not sure because the whole thing looks confusing:

[sb appendString:[NSString stringWithFormat:@"%c",c]];

have you tried :

[sb appendString: [NSString stringWithCString:c encoding:NSUTF8StringEncoding]];

Instead of stringWithFormat?

( The solution of the %C instead of %c proposed by your commenters looks a good idea too. ) - oops - just saw you have tried without success.

Upvotes: 0

Related Questions