Randomized algorithm not behaving as expected

Question

I am implementing an approximate counting algorithm where we:

Maintain a counter X using log (log n) bits

Initialize X to 0
When an item arrives, increase X by 1 with probability (½)^X
When the stream is over, output 2^X − 1 so that E[2^X]= n + 1

My implementation is as follows:

import System.Random

type Prob   = Double
type Tosses = Int

-- * for sake of simplicity we assume 0 <= p <= 1
tos :: Prob -> StdGen -> (Bool,StdGen)
tos p s = (q <= 100*p, s')
  where (q,s') = randomR (1,100) s

toses :: Prob -> Tosses -> StdGen -> [(Bool,StdGen)]
toses _ 0 _ = []
toses p n s = let t@(b,s') = tos p s in t : toses p (pred n) s'

toses' :: Prob -> Tosses -> StdGen -> [Bool]
toses' p n = fmap fst . toses p n

morris :: StdGen -> [a] -> Int
morris s xs = go s xs 0 where
  go _ []     n = n
  go s (_:xs) n = go s' xs n' where
    (h,s') = tos (0.5^n) s 
    n'     = if h then succ n else n

main :: IO Int
main = do
  s <- newStdGen
  return $ morris s [1..10000]

The problem is that my X is always incorrect for any |stream| > 2, and it seems like for all StdGen and |stream| > 1000, X = 7

I tested the same algorithm in Matlab and it works there, so I assume it's either

an issue with my random number generator, or
raising 1/2 to a large n in Double

Please suggest a path forward?

leftaroundabout · Accepted Answer

The problem is actually very simple: with randomR (1,100) you preclude values within the first percent, so you have a complete cutoff at high powers of 1/2 (which all lie in that small interval). Actually a general thing: ranges should start at zero, not at one^†, unless there's a specific reason.

But why even use a range of 100 in the first place? I'd just make it

tos :: Prob -> StdGen -> (Bool,StdGen)
tos p s = (q <= p, s')
  where (q,s') = randomR (0,1) s

^†_{I know, Matlab gets this wrong all over the place. Just one of the many horrible things about that language.}

Unrelated to your problem: as chi remarked this kind of code looks a lot nicer if you use a suitable random monad, instead of manually passing around StdGens.

import Data.Random
import Data.Random.Source.Std

type Prob   = Double

tos :: Prob -> RVar Bool
tos p = do
  q <- uniform 0 1
  return $ q <= p

morris :: [a] -> RVar Int
morris xs = go xs 0 where
  go []     n = return n
  go (_:xs) n = do
    h <- tos (0.5^n)
    go xs $ if h then succ n else n

morrisTest :: Int -> IO Int
morrisTest n = do
  runRVar (morris [1..n]) StdRandom

Randomized algorithm not behaving as expected

Answers (1)

Related Questions