rhardih
rhardih

Reputation: 893

Random index distribution weirdness

I stumbled onto this, trying to do a random biased sample from some data. It seems a simple distribution fitted to x^2 is what I'm looking for, but there's an artefact here I can't quite wrap my head around.

Here's a snippet of a for loop selecting an index in an array distributed by x^2, and then incrementing the counter at that index position.

package main
import "time"
import "fmt"
import "math"
import "math/rand"

func main() {
  rand.Seed(time.Now().UTC().UnixNano())

  var arr [10]int

  for i := 0; i < 5000; i++ {
    rnd := rand.Float64()
    tmp := rnd * rnd * 9

    index := int(math.Floor(tmp + .5))

    arr[index]++
  }
  fmt.Printf("%v", arr)
}

No matter the bounds or the number of iterations, plotting the values the graph always comes out looking like this, with a noticable "drop" at the end.

enter image description here

This is what I have trouble understanding. Shouldn't the indexes fit the curve all the way?

I'm suspecting something related to the rounding, but I'm grasping for straws at the moment.

Upvotes: 2

Views: 130

Answers (2)

Stormwind
Stormwind

Reputation: 824

First, your X-scale is misleading, as it starts from 1 and ends with 10. Should be 0...9.

Considering that it would be fixed, your distribution is fully correct, though maybe not intended (what did you actually want?).

You first have a distribution between 0 and 9, both inclusive. If you add 0.5 and then round down, ask yourself how many hits each index can acually "get"?

A: Most indexes get a "full set" with decimal values between 1 and 2 (or 6 and 7, or any other interval) which gets rounded down to 1 (or 6, or any index)

EXCEPT

The edge indexes 0 and 9 only get a "half full set".

Because you offset index 0...1 to 0.5...1.5 and round down. Only half of this range will then remain for index=0, ie. values between 0.5 and 1 (as there are no longer any between 0 and 0.5).

Same with other end. You offset 8...9 to 8.5...9.5 and then round down. Index 9 only gets 1/2, ie. values between 9 and 9.5.

The left end of your chart is actually lower than you probably expected, though it's not as distinguishable as the right end.

The numbers are indeed sometimes surprising :-).

Upvotes: 1

Greg C
Greg C

Reputation: 153

The problem is that your distribution has the range [0,1], and then you multiply it by 9, making the range [0,9], then you add 0.5, which makes the range [0.5, 9.5].

Not only is there a noticeable drop in the last index value, there is also an unnoticeable drop in the first index value, since each bucket is only half filled.

Have you considered simply multiplying by 10 rather than 9

tmp := rnd * rnd * 10

And then leaving off the + 0.5 in your Floor?

index := int(math.Floor(tmp))

That produces a distribution like you would expect, here are a few results for a loop going to 500,000:

[157949 65411 50239 42599 37637 33706 31200 28789 26927 25543]
[158302 65533 49712 42480 37347 33882 30987 28696 27225 25836]
[157824 65627 50432 42328 37307 33900 30787 29006 26975 25814]

Upvotes: 4

Related Questions