jamone
jamone

Reputation: 17421

Convert NSString with unicode characters into valid HTML

I am getting a string from an API that has anchor tags in it, so I am creating an NSAttributedString from it, and displaying it in a UITextView so I can support tappable links.

The problem is that the incoming string isn't valid HTML, so it has unescaped unicode characters in it. Things like:

While I could deal with those specific cases, I'm concerned about any other unicode characters that come in, that I don't currently know about.

Example:

NSString *fromAPI = @"Reagan \U2014 saying";
NSDictionary *options = @{NSDocumentTypeDocumentAttribute : NSHTMLTextDocumentType};
NSData *data = [fromAPI dataUsingEncoding:NSUTF8StringEncoding allowLossyConversion:NO];
NSAttributedString *attributedString = [[NSAttributedString alloc] initWithData:data options:options documentAttributes:nil error:nil];

This renders in the UITextView as: enter image description here

How do I get it to render the em dash and other unicode properly?

Upvotes: 6

Views: 882

Answers (2)

Neeku
Neeku

Reputation: 3653

What I was going to suggest (if I've understood the question correctly) was to use a regex or something to add the escape character \U0000FE0E or just \UFE0E to the end of all unescaped unicode characters, e.g.:

NSString *fromAPI = @"Reagan \U2014 saying";
NSString *convertedFromAPI = @"Reagan \U2014\UFE0E saying";

But I think what you are doing at the moment, makes more sense.

Upvotes: -1

jamone
jamone

Reputation: 17421

Found it, it looks like HTML won't render unicode unless you add this into the <head>

<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />

Upvotes: 7

Related Questions