DrVilepis
DrVilepis

Reputation: 61

String with literal unicode value to unicode character in Rust

So i have a string for example like

"HITMAN\u2122 Free Trial"

is there any way i can convert the \u2122 to an actual unicode character so that the string would look like this

"HITMAN™ Free Trial"

Edit: for clarification the first string is an utf-8 string from an api, and i need to parse it for display

Upvotes: 4

Views: 7040

Answers (2)

Denys Séguret
Denys Séguret

Reputation: 382092

The first encoding is the one you find in many languages, and for example in JSON. It differs from Rust literals in which you have \u{2122} instead of \u2122.

This gives us a solution: parse it as JSON.


let s = "HITMAN\\u2122 Free Trial";
let s: String = serde_json::from_str(&format!("\"{}\"", s)).unwrap();
assert_eq!(
    s,
    "HITMAN™ Free Trial",
);

But while this is useful as validation, you probably don't want to include a deserializer just for this, so you probably want to do the parsing yourself, for example with a regular expression:

use lazy_regex::*;

let s = "HITMAN\\u2122 Free Trial";
let s = regex_replace_all!(r#"\\u(\d{4})"#, s, |_, num: &str| {
    let num: u32 = u32::from_str_radix(num, 16).unwrap();
    let c: char = std::char::from_u32(num).unwrap();
    c.to_string()
});
assert_eq!(
    s,
    "HITMAN™ Free Trial",
);

Upvotes: 3

Riton Elion
Riton Elion

Reputation: 153

Just add {and} around 2122, like this:

let unicode_str = "HITMAN\u{2122} Free Trial";
println!("{}", unicode_str);

Upvotes: 8

Related Questions