Artur K.
Artur K.

Reputation: 3309

Generating random data for a scatter plot

I'm testing out different JavaScript graphing frameworks. I'm trying out line graphs and scatter plots with generated data. While it's all going quite okay. I've run into a trouble while trying to generate data for a scatter plot.

So it would be quite easy to do something like this in PHP, or in any other language:

for ($i=0; $i < $x; $i++) { 
    $data[] = array(
        'x' => mt_rand(0, 10000),
        'y' => mt_rand(0, 10000)
    );
}

The result is distributed pretty much equally around the whole chart. So here I am trying to think of a way to come up with better random data, which would eventually look more like a scatter plot, rather than a equally distributed dots on a page. And I can't come up with anything.

I would like to end up with something more like this random scatter plot from the Web:

Random scatter plot from the web

So it is more intense in some part of the plot and pretty much nothing in the corners. But I wouldn't like to make it completely impossible for a dot to make it to the corners.

Any algorithmic ideas?

Upvotes: 0

Views: 4093

Answers (2)

Jim Mischel
Jim Mischel

Reputation: 134065

For something like the image you showed, where you have a line around which you want to scatter data, it's pretty easy. For example, imagine a line in which y = x * 0.75. Given that, you select an x value in the range 0..xMax (whatever your maximum X value is), and then generate a value for y with some variance. For example, if 90% of the time the Y value is within 10% of the expected value, then you'd generate a random value between 0.675x and 0.825x.

Say that 5% of the time, the Y value is within 50% of the expected value and 5% of the time the value is unconstrained. For each of those, you generate a Y value the same way: a random value that is equal to the expected Y value, plus or minus 50% (or, in the latter case, plus or minus some very large number).

You can adjust the probabilities and the variance as appropriate.

You can also adjust the distribution of X values. For example, it looks like most of your data points are between about .15 xMax and .6 xMax. So what you want is a higher percentage of X values in that range. Imagine, then that your X values are broken into three different ranges:

0 to .149 * xMax  - 20%
.15 to .60 * xMax - 70%
> .60 xMax - 10%

Generate a random number between 0 and 100. Then:

if value < 20, generate an x value between 0 and .15 xmax
if value > 19 & < 60, generate an x value between .15 xMax and .60 xMax
otherwise, generate an x value > .60 xMax and < xMax

Upvotes: 2

Daniel Br&#252;ckner
Daniel Br&#252;ckner

Reputation: 59705

  1. Define a function that becomes the center line of the distribution, for example c(x) = sqrt(x).

  2. Define a function that specifies the maximal allowed deviation from the center line, for example d(x) = 0.1 (x - 5)².

  3. For every x value generate one or a few y values y(x) = c(x) + 2 * (random() - 0.5) * d(x) where random() is a (pseudo) random number generator with values in [0;1].

  4. For a more realistic look use a (pseudo) random number generator that has a more interesting distribution, for example a normal distributed with standard deviation d(x).

Upvotes: 1

Related Questions