Reputation: 1111
I am receiving a string from a service that apparently encode its unicode characters using UTF-32 encoding like: \U0001B000
(C style unicode encoding). However, for serializing this information in JSON, I do have to encode it in UTF-16 like: \uD82C\uDC00
.
However, I have no idea how I can read such an encoded string in Java/Clojure, and how to produce an output with that other encoded format.
Upvotes: 3
Views: 845
Reputation: 3418
You can read the received bytes from the service using:
(slurp received-bytes :encoding "UTF-32")
and write a string using:
(spit destination string-to-encode :encoding "UTF-16")
If you mean that you have a string that represents the binary of the encoded character, then you can convert it using:
(defn utf32->str [utf32-str]
(let [buf (java.nio.ByteBuffer/allocate 4)]
(.putInt buf (Integer/parseInt (subs utf32-str 2) 16))
(String. (.array buf) "UTF-32")))
(utf32->str "\\U0001B000" )
and then convert it to UTF-16 using:
(defn str->utf16 [s]
(let [byte->str #(format "%02x" %)]
(apply str
(drop 1 (map #(str "\\U" (byte->str (first %) ) (byte->str (second %) ))
(partition 2 (.getBytes s "UTF-16")))))))
Here is a sample run:
(str->utf16 (utf32->str "\\U0001B000"))
;=> "\\Ud82c\\Udc00"
Upvotes: 2
Reputation: 6073
Once you have the string you want to replace, the following function will do it:
(defn escape-utf16
[[_ _ a b c d]]
(format "\\u%02X%02X\\u%02X%02X" a b c d))
(defn replace-utf32
[^String s]
(let [n (Integer/parseInt (subs s 2) 16)]
(-> (->> (map #(bit-shift-right n %) [24 16 8 0])
(map #(bit-and % 0xFF))
(byte-array))
(String. "UTF-32")
(.getBytes "UTF-16")
(escape-utf16))))
(replace-utf32 "\\U0001B000")
;; => "\\uD82C\\uDC00"
And, for targeted replacement, use a regex:
(require '[clojure.string :as string])
(string/replace
"this is a text \\U0001B000."
#"\\U[0-9A-F]{8}"
replace-utf32)
;; => "this is a text \\uD82C\\uDC00."
Disclaimer: I haven't given a single thought to edge- (or any other than the provided) cases. But I'm sure you can use this as a base for further exploration.
Upvotes: 1