Tunnelblick
Tunnelblick

Reputation: 147

Ruby decompose UTF-8 chars

I achieved to get the UTF-16 code of "ü" using

#!/bin/env ruby
# encoding: UTF-8

puts "ü".unpack('U*')

Well, it just returns 252 which is fine. I read the online doc for ruby String but I don't get it how to decompose this character.

In case of ü I want to get the character u (117) and ¨ (168)

Thanks in advance, I appreciate any help

Upvotes: 2

Views: 217

Answers (1)

mu is too short
mu is too short

Reputation: 434635

String#unpack and Array#pack are, as ForeverZer0 mentioned in the comments, for decoding binary strings into more structured data (such as numbers) and encoding data into strings (respectively). If you want to decompose unicode, you want String#unicode_normalize and the NFD form:

> "ü".unicode_normalize(:nfd).chars
 => ["u", "̈"] 

That gives you 117 and 776, not 168. 168 is ¨ in ISO-8859-1 not UTF-8.

Upvotes: 5

Related Questions