doctopus
doctopus

Reputation: 5657

How to generate strings drawn from every possible character?

At the moment I'm generating strings like this:

arbStr :: Gen String
arbStr = listOf $ elements (alpha ++ digits)
  where alpha = ['a'..'z']
        digits = ['0'..'9']

But obviously this only generates strings from alpha num chars. How can I do it to generate from all possible chars?

Upvotes: 1

Views: 278

Answers (3)

Redu
Redu

Reputation: 26161

When we do like

λ> length ([minBound..maxBound] :: [Char])
1114112

we get the number of all characters and say Wow..! If you think the list is too big then you may always do like drop x . take y to limit the range.

Accordingly, if you need n many random characters just shuffle :: [a] -> IO [a] the list and do a take n from that shuffled list.

Edit:

Well of course... since shuffling could be expensive, it's best if we chose a clever strategy. It would be ideal to randomly limit the all characters list. So just

  1. make a limits = liftM sort . mapM randomRIO $ replicate 2 (0,1114112) :: (Ord a, Random a, Num a) => IO [a]

  2. limits >>= \[min,max] -> return . drop min . take max $ ([minBound..maxBound] :: [Char])

  3. Finally just take n many like random Chars like liftM . take n from the result of Item 2.

Upvotes: 1

willeM_ Van Onsem
willeM_ Van Onsem

Reputation: 476709

Char is a instance of both the Enum and Bounded typeclass, you can make use of the arbitraryBoundedEnum :: (Bounded a, Enum a) => Gen a function:

import Test.QuickCheck(Gen, arbitraryBoundedEnum, listOf)

arbStr :: Gen String
arbStr = listOf arbitraryBoundedEnum

For example:

Prelude Test.QuickCheck> sample arbStr
""
""
"\821749"
"\433465\930384\375110\256215\894544"
"\431263\866378\313505\1069229\238290\882442"
""
"\126116\518750\861881\340014\42369\89768\1017349\590547\331782\974313\582098"
"\426281"
"\799929\592960\724287\1032975\364929\721969\560296\994687\762805\1070924\537634\492995\1079045\1079821"
"\496024\32639\969438\322614\332989\512797\447233\655608\278184\590725\102710\925060\74864\854859\312624\1087010\12444\251595"
"\682370\1089979\391815"

Or you can make use of the arbitrary in the Arbitrary Char typeclass:

import Test.QuickCheck(Gen, arbitrary, listOf)

arbStr :: Gen String
arbStr = listOf arbitrary

Note that the arbitrary for Char is implemented such that ASCII characters are (three times) more common than non-ASCII characters, so the "distribution" is different.

Upvotes: 2

amalloy
amalloy

Reputation: 91907

Since Char is an instance of Bounded as well as Enum (confirm this by asking GHCI for :i Char), you can simply write

[minBound..maxBound] :: [Char]

to get a list of all legal characters. Obviously this will not lead to efficient random access, though! So you could instead convert the bounds to Int with Data.Char.ord :: Char -> Int, and use QuickCheck's feature to select from a range of integers, then map back to a character with Data.Chra.chr :: Int -> Char.

Upvotes: 1

Related Questions