Reputation: 12107
My goal is to get a random emoticon, from a list, in F#.
I started with this:
let pickOne (icons: string) : char = icons.[Helpers.random.Next(icons.Length)]
let happySymbols = "๐ฅ๐๐๐๐๐๐ช๐๐๐๐๐ค๐๐ค๐คฉ๐คช๐ค ๐ฅณ๐๐คค๐๐"
let sadSymbols = "๐ญ๐๐๐ฉ๐ข๐คฆ๐คท๐ฑ๐๐คจ๐๐ฌ๐๐คฎ๐ต๐คฏ๐ง๐๐๐ค๐ก๐คฌ"
that doesn't work because:
"๐ฅ๐๐๐๐๐๐ช๐๐๐๐๐ค๐๐ค๐คฉ๐คช๐ค ๐ฅณ๐๐คค๐๐".Length
is returning 44 as length returns the number of chars in a string, which is not working well with unicode characters. I can't just divide by 2 because I may add some single byte characters in the string at some point.
Indexing doesn't work either:
let a = "๐ฅ๐๐๐๐๐๐ช๐๐๐๐๐ค๐๐ค๐คฉ๐คช๐ค ๐ฅณ๐๐คค๐๐"
a.[0]
will not return ๐ฅ but I get some unknown character symbol.
so, plan B was: let's make this an array instead of a string:
let a = [| '๐ฅ'; '๐'; '๐'; '๐'; '๐'; '๐'; '๐ช'; '๐'; '๐'; '๐'; '๐'; '๐ค'; '๐'; '๐ค'; '๐คฉ'; '๐คช'; '๐ค '; '๐ฅณ'; '๐'; '๐คค'; '๐'; '๐' |]
this is not compiling, I'm getting:
Parse error Unexpected quote symbol in binding. Expected '|]' or other token.
why is that?
anyhow, I can make a list of strings and get it to work, but I'm curious: is there a "proper" way to make the first one work and take a random unicode character from a unicode string?
Upvotes: 4
Views: 1818
Reputation: 3470
Asti's answer works for your purpose, but I wasn't too happy about where we landed on this. I guess I got hung up in the word "proper" in the answer. After a lot of research in various places, I got curious about the method String.EnumerateRunes, which again lead me to the type Rune. The documentation for that type is particularly enlightening about proper string handling, and what's in a Unicode UTF-8 string in .NET. I also experimented in LINQPad, and got this.
let dump x = x.Dump()
let runes = "abcABCรฆรธรฅรรร
๐๐๐โ
่จ่ง่ฆ่ฅ".EnumerateRunes().ToArray()
runes.Length |> dump
// 20
runes |> Array.iter (fun rune -> dump (string rune))
// a b c A B C รฆ รธ รฅ ร ร ร
๐ ๐ ๐ โ
่จ ่ง ่ฆ ่ฅ
dump runes
// see screenshot
let smiley = runes.[13].ToString()
dump smiley
// ๐
Upvotes: 4
Reputation: 12667
All strings in .NET are 16-bit unicode strings.
That's the definition of char
:
Represents a character as a UTF-16 code unit.
All characters take up the minimum encoding size (2 bytes for UTF-16), up to as many bytes as required. Emojis don't fit in 2 bytes, so they align to 4 bytes, or 2 chars.
So what's the solution? align(4) all the things! (insert GCC joke here).
First we convert everything into UTF32
:
let utf32 (source: string) =
Encoding.Convert(Encoding.Unicode, Encoding.UTF32, Encoding.Unicode.GetBytes(source))
Then we can pick and choose any "character":
let pick (arr: byte[]) index =
Encoding.UTF32.GetString(arr, index * 4, 4)
Test:
let happySymbols = "๐ฅ๐๐๐๐๐๐ช๐๐๐๐๐ค๐๐ค๐คฉ๐คช๐ค ๐ฅณ๐๐คค๐๐YTHO"
pick (utf32 happySymbols) 0;;
val it : string = "๐ฅ"
> pick (utf32 happySymbols) 22;;
val it : string = "Y"
For the actual length, just div by 4.
let surpriseMe arr =
let rnd = Random()
pick arr (rnd.Next(0, arr.Length / 4))
Hmmm
> surpriseMe (utf32 happySymbols);;
val it : string = "๐"
Upvotes: 2