Reputation: 19345
I've come up with 2 methods to generate relatively short random strings- one is much faster and simpler and the other much slower but I think more random. Is there a not-super-complicated method or way to measure how random the data from each method might be?
I've tried compressing the output strings (via zlib) figuring the more truly random the data, the less it will compress but that hasn't proved much.
Upvotes: 10
Views: 917
Reputation: 19855
An outcome is considered random if it can't be predicted ahead of time with certainty. If it can be predicted with certainty it is considered deterministic. This is a binary categorization, outcomes either are deterministic or random, there aren't degrees of randomness. There are, however, degrees of predictability. One measure of predictability is entropy, as mentioned by EMS.
Consider two games. You don't know on any given play whether you're going to win or lose. In game 1, the probability of winning is 1/2, i.e., you win about half the time in the long run. In game 2, the odds of winning are 1/100. Both games are considered random, because the outcome isn't a dead certainty. Game 1 has greater entropy than game 2, because the outcome is less predictable - while there's a chance of winning, you're pretty sure you're going to lose on any given trial.
The amount of compression that can be achieved (by a good compression algorithm) for a sequence of values is related to the entropy of the sequence. English has pretty low entropy (lots of redundant info both in the relative frequency of letters and the sequences of words that occur as groups), and hence tends to compress pretty well.
Upvotes: 0
Reputation: 7130
You can use some mapping to convert strings to numeric and then apply standard tests like Diehard
and TestU01
. Note that long sequences of samples are needed (typically few MB files will do)
Upvotes: 0
Reputation: 77464
You are using standard compression as a proxy for the uncomputable Kolmogorov Complexity, which is the "right" mathematical framework for quantifying randomness (but, unfortunately, is not computable).
You might also try some measure of entropy if you are willing to assume some kind of distribution over strings.
Upvotes: 9