Reputation:
I went as far as searching C sources, but I can't find this function, and I really don't want to write one myself because it absolutely must be there.
To elaborate: Unicode points are represented as U+######## - this is easy to get, what I need, is the format the character is written to a file (for example). A Unicode codepoint translates to bytes such that 7 bits of the rightmost byte are written to the first byte, then 6 bits of the next bits are written into the next byte and so on. Emacs certainly knows how to do it, but there's no way I can find to get the byte sequence of UTF-8 encoded string from it as a sequence of bytes (each containing 8 bits).
Functions such as get-byte
or multybite-char-to-unibyte
work only with characters that can be represented using no more then 8 bits. I need the same thing what get-byte
does, but for multibyte characters, so that instead of an integer 0..256 I'd receive either a vector of integers 0..256 or a single long integer 0..2^32.
EDIT
Just in case anyone will need this later:
(defun haxe-string-to-x-string (s)
(with-output-to-string
(let (current parts)
(dotimes (i (length s))
(if (> 0 (multibyte-char-to-unibyte (aref s i)))
(progn
(setq current (encode-coding-string
(char-to-string (aref s i)) 'utf-8))
(dotimes (j (length current))
(princ (format "\\x%02x" (aref current j)))))
(princ (format "\\x%02x" (aref s i))))))))
Upvotes: 6
Views: 1179
Reputation: 41648
encode-coding-string
might be what you're looking for:
*** Welcome to IELM *** Type (describe-mode) for help.
ELISP> (encode-coding-string "eĥoŝanĝo ĉiuĵaŭde" 'utf-8)
"e\304\245o\305\235an\304\235o \304\211iu\304\265a\305\255de"
It returns a string, but you can access the individual bytes with aref
:
ELISP> (aref (encode-coding-string "eĥoŝanĝo ĉiuĵaŭde" 'utf-8) 1)
196
ELISP> (format "%o" 196)
"304"
or if you don't mind using cl
functions, concatenate
is your friend:
ELISP> (concatenate 'list (encode-coding-string "eĥoŝanĝo ĉiuĵaŭde" 'utf-8))
(101 196 165 111 197 157 97 110 196 157 111 32 196 137 105 117 196 181 97 197 173 100 101)
Upvotes: 5