Thomas
Thomas

Reputation: 12107

Getting a random emoji/character from a unicode string

My goal is to get a random emoticon, from a list, in F#.

I started with this:

let pickOne (icons: string) : char = icons.[Helpers.random.Next(icons.Length)]
let happySymbols = "๐Ÿ”ฅ๐Ÿ˜‚๐Ÿ˜Š๐Ÿ˜๐Ÿ™๐Ÿ˜Ž๐Ÿ’ช๐Ÿ˜‹๐Ÿ˜‡๐ŸŽ‰๐Ÿ™Œ๐Ÿค˜๐Ÿ‘๐Ÿค‘๐Ÿคฉ๐Ÿคช๐Ÿค ๐Ÿฅณ๐Ÿ˜Œ๐Ÿคค๐Ÿ˜๐Ÿ˜€"
let sadSymbols   = "๐Ÿ˜ญ๐Ÿ˜”๐Ÿ˜’๐Ÿ˜ฉ๐Ÿ˜ข๐Ÿคฆ๐Ÿคท๐Ÿ˜ฑ๐Ÿ‘Ž๐Ÿคจ๐Ÿ˜‘๐Ÿ˜ฌ๐Ÿ™„๐Ÿคฎ๐Ÿ˜ต๐Ÿคฏ๐Ÿง๐Ÿ˜•๐Ÿ˜Ÿ๐Ÿ˜ค๐Ÿ˜ก๐Ÿคฌ"

that doesn't work because:

"๐Ÿ”ฅ๐Ÿ˜‚๐Ÿ˜Š๐Ÿ˜๐Ÿ™๐Ÿ˜Ž๐Ÿ’ช๐Ÿ˜‹๐Ÿ˜‡๐ŸŽ‰๐Ÿ™Œ๐Ÿค˜๐Ÿ‘๐Ÿค‘๐Ÿคฉ๐Ÿคช๐Ÿค ๐Ÿฅณ๐Ÿ˜Œ๐Ÿคค๐Ÿ˜๐Ÿ˜€".Length

is returning 44 as length returns the number of chars in a string, which is not working well with unicode characters. I can't just divide by 2 because I may add some single byte characters in the string at some point.

Indexing doesn't work either:

let a = "๐Ÿ”ฅ๐Ÿ˜‚๐Ÿ˜Š๐Ÿ˜๐Ÿ™๐Ÿ˜Ž๐Ÿ’ช๐Ÿ˜‹๐Ÿ˜‡๐ŸŽ‰๐Ÿ™Œ๐Ÿค˜๐Ÿ‘๐Ÿค‘๐Ÿคฉ๐Ÿคช๐Ÿค ๐Ÿฅณ๐Ÿ˜Œ๐Ÿคค๐Ÿ˜๐Ÿ˜€"
a.[0]

will not return ๐Ÿ”ฅ but I get some unknown character symbol.

so, plan B was: let's make this an array instead of a string:

let a = [| '๐Ÿ”ฅ'; '๐Ÿ˜‚'; '๐Ÿ˜Š'; '๐Ÿ˜'; '๐Ÿ™'; '๐Ÿ˜Ž'; '๐Ÿ’ช'; '๐Ÿ˜‹'; '๐Ÿ˜‡'; '๐ŸŽ‰'; '๐Ÿ™Œ'; '๐Ÿค˜'; '๐Ÿ‘'; '๐Ÿค‘'; '๐Ÿคฉ'; '๐Ÿคช'; '๐Ÿค '; '๐Ÿฅณ'; '๐Ÿ˜Œ'; '๐Ÿคค'; '๐Ÿ˜'; '๐Ÿ˜€' |]

this is not compiling, I'm getting:

Parse error Unexpected quote symbol in binding. Expected '|]' or other token.

why is that?

anyhow, I can make a list of strings and get it to work, but I'm curious: is there a "proper" way to make the first one work and take a random unicode character from a unicode string?

Upvotes: 4

Views: 1818

Answers (2)

Bent Tranberg
Bent Tranberg

Reputation: 3470

Asti's answer works for your purpose, but I wasn't too happy about where we landed on this. I guess I got hung up in the word "proper" in the answer. After a lot of research in various places, I got curious about the method String.EnumerateRunes, which again lead me to the type Rune. The documentation for that type is particularly enlightening about proper string handling, and what's in a Unicode UTF-8 string in .NET. I also experimented in LINQPad, and got this.

let dump x = x.Dump()
let runes = "abcABCรฆรธรฅร†ร˜ร…๐Ÿ˜‚๐Ÿ˜Š๐Ÿ˜โ‚…่Œจ่Œง่Œฆ่Œฅ".EnumerateRunes().ToArray()
runes.Length |> dump
// 20
runes |> Array.iter (fun rune -> dump (string rune))
// a b c A B C รฆ รธ รฅ ร† ร˜ ร… ๐Ÿ˜‚ ๐Ÿ˜Š ๐Ÿ˜ โ‚… ่Œจ ่Œง ่Œฆ ่Œฅ
dump runes
// see screenshot
let smiley = runes.[13].ToString()
dump smiley
// ๐Ÿ˜Š

enter image description here

Upvotes: 4

Asti
Asti

Reputation: 12667

All strings in .NET are 16-bit unicode strings. That's the definition of char:

Represents a character as a UTF-16 code unit.

All characters take up the minimum encoding size (2 bytes for UTF-16), up to as many bytes as required. Emojis don't fit in 2 bytes, so they align to 4 bytes, or 2 chars.

So what's the solution? align(4) all the things! (insert GCC joke here).

First we convert everything into UTF32:

let utf32 (source: string) =
    Encoding.Convert(Encoding.Unicode, Encoding.UTF32, Encoding.Unicode.GetBytes(source))

Then we can pick and choose any "character":

let pick (arr: byte[]) index = 
    Encoding.UTF32.GetString(arr, index * 4, 4)

Test:

let happySymbols = "๐Ÿ”ฅ๐Ÿ˜‚๐Ÿ˜Š๐Ÿ˜๐Ÿ™๐Ÿ˜Ž๐Ÿ’ช๐Ÿ˜‹๐Ÿ˜‡๐ŸŽ‰๐Ÿ™Œ๐Ÿค˜๐Ÿ‘๐Ÿค‘๐Ÿคฉ๐Ÿคช๐Ÿค ๐Ÿฅณ๐Ÿ˜Œ๐Ÿคค๐Ÿ˜๐Ÿ˜€YTHO"

pick (utf32 happySymbols) 0;;
val it : string = "๐Ÿ”ฅ"

> pick (utf32 happySymbols) 22;;
val it : string = "Y"

For the actual length, just div by 4.

let surpriseMe arr =
    let rnd = Random()
    pick arr (rnd.Next(0, arr.Length / 4))

Hmmm

> surpriseMe (utf32 happySymbols);;
val it : string = "๐Ÿ˜"

Upvotes: 2

Related Questions