schuelermine
schuelermine

Reputation: 2288

Why does the shuffle' function require an Int parameter?

In System.Random.Shuffle,

shuffle' :: RandomGen gen => [a] -> Int -> gen -> [a]

The hackage page mentions this Int argument as

..., its length,...

However, it seems that a simple wrapper function like

shuffle'' x = shuffle' x (length x)

should've sufficed.

Upvotes: 2

Views: 104

Answers (2)

Yann Vernier
Yann Vernier

Reputation: 15887

shuffle operates by building a tree form of its input list, including the tree size. The buildTree function performs this task using Data.Function.fix in a manner I haven't quite wrapped my head around. Somehow (I think due to the recursion of inner, not the fix magic), it produces a balanced tree, which then has logarithmic lookup. Then it consumes this tree, rebuilding it for every extracted item. The advantage of the data structure would be that it only holds remaining items in an immutable form; lazy updates work for it. But the size of the tree is required data during the indexing, so there's no need to pass it separately to generate the indices used to build the permutation. System.Random.Shuffle.shuffle indeed has no random element - it is only a permutation function. shuffle' exists to feed it a random sequence, using its internal helper rseq. So the reason shuffle' takes a length argument appears to be because they didn't want it to touch the list argument at all; it's only passed into shuffle.

The task doesn't seem terribly suitable for singly linked lists in the first place. I'd probably consider using VectorShuffling instead. And I'm baffled as to why rseq isn't among the exported functions, being the one that uses a random number generator to build a permutation... which in turn might have been better handled using Data.Permute. Probably the reasons have to with history, such as Data.Permute being written later and System.Random.Shuffle being based on a paper on immutable random access queues.

Data.Random.Extras seems to have a more straight forward Seq-based shuffle function.

Upvotes: 4

It might be a case when length of the given list is already known, and doesn't need to be calculated again. Thus, it might be considered as an optimisation.

Besides, in general, the resulting list doesn't need to have the same size as the original one. Thus, this argument could be used for setting this length.

This is true for the original idea of Oleg (source - http://okmij.org/ftp/Haskell/perfect-shuffle.txt):

-- examples

t1 = shuffle1 ['a','b','c','d','e'] [0,0,0,0]
-- "abcde"
-- Note, that rseq of all zeros leaves the sequence unperturbed.

t2 = shuffle1 ['a','b','c','d','e'] [4,3,2,1]
-- "edcba"
-- The rseq of (n-i | i<-[1..n-1]) reverses the original sequence of elements

However, it's not the same for the 'random-shuffle' package implementation:

> shuffle [0..10] [0,0,0,0]
[0,1,2,3random-shuffle.hs: [shuffle] called with lists of different lengths

I think it worth to follow-up with the packages maintainers in order to understand the contract of this function.

Upvotes: 3

Related Questions