Raj
Raj

Reputation: 6025

Incorrect decoding of known UTF-8 string from server

In my application, I am getting some string values from a server, but I'm not ending up with the right string.

بسيط this is the string from server side, but what I am getting is بسÙØ·

I tried to test the response string in an online decoder:

http://www.cafewebmaster.com/online_tools/utf8_encode

It is UTF-8 encoded, but I couldn't decode the string on the iPhone side.

I took a look at these Stack Overflow links as reference

Converting escaped UTF8 characters back to their original form
unicode escapes in objective-c
utf8_decode for objective-c

but none of them helped.

Upvotes: 1

Views: 1179

Answers (3)

Raj
Raj

Reputation: 6025

SOLVED the issue from this link

Different kind of UTF8 decoding in NSString

NSString *string = @"بسÙØ·";

I tried

[NSString stringWithUTF8String:(char*)[string cStringUsingEncoding:NSISOLatin1StringEncoding]]

this method

Thank You.

Upvotes: 0

antf
antf

Reputation: 3222

I don't understand from your question the following points:

  1. Do you have access on the server side (I mean the programming of it)?
  2. How do you send and receive data to the server?

For the first question I will assume that the server is programmed to send you text in UTF-8 encoding.

Now on the iPhone if you are sending to the server using sockets use the following:

NSString *messageToSend = @"The text in the language you like";
const uint8_t *str = (uint8_t *) [messageToSend cStringUsingEncoding:NSUTF8StringEncoding];
[self writeToServer:str];

Where the function writeToServer is your function that will send the data to the server.

If you are willing to put the data in a SQLite3 database use:

sqlite3_bind_text(statement, 2, [@"The text in the language you like" UTF8String], -1, NULL);

If you are receiving the data from the server (again using sockets) do the following:

[rowData appendBytes:(const void *)buf length:len];
NSString *strRowData = [[NSString alloc] initWithData:rowData encoding:NSUTF8StringEncoding];

I hope this covers all the cases you need.

Upvotes: 1

mvds
mvds

Reputation: 47104

Without any source it is hard to say anything conclusive, but at some point you are interpreting a UTF-8 encoded string as ISO-8859-1, and (wrongfully) converting it to UTF-8:

Analysis for string 'بسيط':

  • raw length: 8
  • logical length: 4
  • raw bytes: 0xD8 0xA8 0xD8 0xB3 0xD9 0x8A 0xD8 0xB7
  • interpreted as ISO-8859-1 (بسÙØ·): 0xC3 0x98 0xC2 0xA8 0xC3 0x98 0xC2 0xB3 0xC3 0x99 0xC2 0x8A 0xC3 0x98 0xC2 0xB7

So at some point you should probably find some reference to ISO-8859-1 in your code. Find it and remove it.

Upvotes: 0

Related Questions