thanks_in_advance
thanks_in_advance

Reputation: 2743

is there a way to decipher a given encoding?

On Twitter, this user: https://twitter.com/Rockprincess818

seems to have used creative encoding techniques to achieve special formatting:

They list their name as:

𝓛𝓲𝓼𝓪

And their bio as:

𝐈'𝐦 𝐧𝐨𝐭 𝐡𝐞𝐫𝐞 𝐟𝐨𝐫 𝐲𝐨𝐮𝐫 𝐚𝐦𝐮𝐬𝐞𝐦𝐞𝐧𝐭. 𝐘𝐨𝐮'𝐫𝐞 𝐡𝐞𝐫𝐞 𝐟𝐨𝐫 𝐦𝐢𝐧𝐞.

None of this seems to be a standard encoding (nor even English -- though I could be wrong about this).

My questions:

  1. What did they do to achieve this special formatting?
  2. How does one decipher such non-normal text to understand what's going on?

Upvotes: 0

Views: 53

Answers (2)

snakecharmerb
snakecharmerb

Reputation: 55640

The Unicode standard has a concept of compatibility, which allows some codepoints to be defined as equivalent to others. Given the strings in the question, the NFKC normalisation (Normalization Form Compatibility Composition) can be applied to obtain the equivalent latin characters. Programming languages may provide tools to apply normalisation programmatically.

In javascript, the string.normalize method may be used

name = '𝓛𝓲𝓼𝓪'
"𝓛𝓲𝓼𝓪"
bio = "𝐈'𝐦 𝐧𝐨𝐭 𝐡𝐞𝐫𝐞 𝐟𝐨𝐫 𝐲𝐨𝐮𝐫 𝐚𝐦𝐮𝐬𝐞𝐦𝐞𝐧𝐭. 𝐘𝐨𝐮'𝐫𝐞 𝐡𝐞𝐫𝐞 𝐟𝐨𝐫 𝐦𝐢𝐧𝐞."
"𝐈'𝐦 𝐧𝐨𝐭 𝐡𝐞𝐫𝐞 𝐟𝐨𝐫 𝐲𝐨𝐮𝐫 𝐚𝐦𝐮𝐬𝐞𝐦𝐞𝐧𝐭. 𝐘𝐨𝐮'𝐫𝐞 𝐡𝐞𝐫𝐞 𝐟𝐨𝐫 𝐦𝐢𝐧𝐞."
name.normalize('NFKC')
"Lisa"
bio.normalize('NFKC')
"I'm not here for your amusement. You're here for mine."

In python, the unicodedata.normalize function may be used

>>> import unicodedata as ud
>>> name = '𝓛𝓲𝓼𝓪'
>>> bio = "𝐈'𝐦 𝐧𝐨𝐭 𝐡𝐞𝐫𝐞 𝐟𝐨𝐫 𝐲𝐨𝐮𝐫 𝐚𝐦𝐮𝐬𝐞𝐦𝐞𝐧𝐭. 𝐘𝐨𝐮'𝐫𝐞 𝐡𝐞𝐫𝐞 𝐟𝐨𝐫 𝐦𝐢𝐧𝐞."
>>> ud.normalize('NFKC', name)
'Lisa'
>>> ud.normalize('NFKC', bio)
"I'm not here for your amusement. You're here for mine."

Upvotes: 1

Petr Srníček
Petr Srníček

Reputation: 2386

1) There are many online generators (eg. this one or this one) that let users convert normal text to some fancy graphical representation by replacing Latin alphabet letters with similar-looking Unicode symbols.

2) The most obvious way to decipher such text back to normal Latin characters would be to try to find which tools the user uses and what mappings those tools employ. You could then map the fancy Unicode codepoints back to Latin characters. You could find the mappings eg. by converting "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz" to "cursive" with those tools and analyzing the output.

Upvotes: 1

Related Questions