Reputation: 9110

Sampling from a given region

I am exploring a way to sample from the following region on a two dimensional coordinate (the blue region):

Of course, the baseline is that I can sample random number (x,y) and then check if it overlaps with the smaller box or outside the larger box. But that just wastes too many computing resource, after some quick trial.

Any suggestion or advice will be appreciated, thank you.

Upvotes: 2

Answers (3)

samgak

Reputation: 24417

Here's a solution that assumes that the blue region is symmetrical and centered on the origin, so there are 4 parameters (x1, x2, y1, y2). Imagine inside the blue region is another region with the same proportions but scaled down so that the outer border of this other region fits exactly inside the inner border of the blue region. If we randomly generate a point and it lies inside this inner region, we can map it to a point in the blue region by scaling x and y by x2/x1 and y2/y1 respectively. Now imagine another region inside this one, and another inside it, ad infinitum. Any point (except for the origin point) can then be mapped to a point in the blue region just by scaling it up the right number of times:

// generate a random point:
double x = 0.0, y = 0.0;
while(x == 0.0 && y == 0.0) // exclude the origin
{
    x = random.NextDouble() * x2;
    y = random.NextDouble() * y2;
}

// map to the blue region
while(x < x1 && y < y1)
{
    x *= (x2 / x1);
    y *= (y2 / y1);
}

// randomly choose a quadrant:
int r = random.Next(0, 4);
if((r & 1) != 0)
    x = -x;
if((r & 2) != 0)
    y = -y;

However this isn't that great because of the second while loop (the first while loop is virtually guaranteed to never run more than once). The loop can be eliminated using logarithms:

// map to the blue region
if(x < x1 && y < y1)
{
    double xpower = Math.Ceiling((Math.Log(x1) - Math.Log(x)) / Math.Log(x2/x1));
    double ypower = Math.Ceiling((Math.Log(y1) - Math.Log(y)) / Math.Log(y2/y1));
    double power = Math.Min(xpower, ypower);
    x *= Math.Pow(x2/x1, power);
    y *= Math.Pow(y2/y1, power);
}

Upvotes: 1

wiluqa

Reputation: 76

If the blue area is symmetric to the origin you could create a mapping from a random point sampled from the unit square. Consider the following pseudocode function, assuming that both rectangles are centered around the origin:

def sample():
    sample point x_base from [-1, 1] and calculate x = sign(x_base)*x1 + x_base*(x2-x1)
    sample point y_base from [-1, 1] and calculate y = sign(y_base)*y1 + y_base*(y2-y1)
    if (x,y) == (0,0) then:
        # recursively sample again in the rare case of sampling 0 for both dimensions
        return sample()
    else:
        return (x,y)

EDIT: This solution will not sample correctly from the entire blue region as pointed out by Marco13. Check out his answer for a better approach.

Upvotes: 1

Marco13

Reputation: 54649

There are probably some constraints that could allow for a simpler solution.

So the following might not be a satisfactory solution for your case!

But it's a very generic solution, that's why I hope it is OK to post it here.

First of all, from the drawing, it looks like the rectangles are always centered at the origin (both of them). If this is a valid assumption, the parts of the following solution might be simplified.

Then, it is not entirely clear how the "baseline" solution that you suggested should be used. You suggested to generate points (x,y), and for each one, you check whether it is contained in the inner rectangle. If it is contained in the inner rectangle, then it is discarded.

Now imagine you want to sample 100 points from the blue area. How many points do you have to generate in order to be sure that you found 100 points that have not been discarded?

This cannot be solved deterministically. Or more formally: You cannot provide a totally correct implementation of that. The random number generator could always generate points that are in the inner rectangle, and thus be discarded. Of course, it will not do this in practice, but you cannot prove this, and that's the point.

If the inner rectangle is "large" compared to the outer one, this will have practical implications. You might have to generate a few million points just to obtain the 100 ones that are in the narrow margin between the inner and the outer rectangle.

However, the following is a solution that does not suffer from the aforementioned problems. This comes at a cost: It is not a particularly efficient solution (although, as mentioned above, the "relative efficiency" compared to the baseline solution depends on the sizes of the rectangles and the usage pattern).

Assuming that the coordinates of the corner points are given as in this image:

(x0,y0)                        (x3,y3)
   O------------------------------O
   |                              |
   |  (ix0,iy0)        (ix3,iy3)  |
   |      O----------------O      |
   |      |                |      |
   |      |                |      |
   |      |                |      |
   |      |                |      |
   |      |                |      |
   |      O----------------O      |
   |  (ix1,iy1)        (ix2,iy2)  |
   |                              |
   O------------------------------O
(x1,y1)                        (x2,y2)

(Note that the coordinates are arbitrary, and the rectangles are not necessarily centered at the origin)

From that, you can compute regions that may contain points:

   O------O----------------O------O
   |      |                |      |
   |  R0  |       R1       |  R2  |
   O------O----------------O------|
   |      |                |      |
   |      |                |      |
   |  R2  |                |  R4  |
   |      |                |      |
   |      |                |      |
   O------O----------------O------O
   |  R5  |       R6       |  R7  |
   |      |                |      |
   O------O----------------O------O

Now, when you want to sample n points, then you can, for each point, randomly select one of these regions, and place the point at a random position inside this region.

A caveat is then the selection of the region: The region should be chosen with a probability that corresponds to the area of the region, relative to the total area of all regions. Pragmatically, you can compute the total area of all regions (as outer.w*outer.h-inner.w*inner.h), and then compute the cumulative probability distribution for a point ending up in one of the regions R0...R7. From these cumulative distributions, you can map a random value between 0.0 and 1.0 to the appropriate region.

The advantages of this approach are

it works for any rectangles (as long as the outer one contains the inner one...)
you can directly specify the number of points that should be generated, beforehand, and it will generate exactly the right number of points
it can be implemented deterministically (i.e. totally correct)

Here is an example showing the result, dragging the slider to generate 1...2000 points:

It was generated with the following MCVE. It is implemented in Java, but as long as you have some structures like Point and Rectangle, it should be fairly trivial to port this to other languages:

import java.awt.BorderLayout;
import java.awt.Color;
import java.awt.Graphics;
import java.awt.Graphics2D;
import java.awt.RenderingHints;
import java.awt.geom.Ellipse2D;
import java.awt.geom.Point2D;
import java.awt.geom.Rectangle2D;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;
import java.util.Random;

import javax.swing.JFrame;
import javax.swing.JPanel;
import javax.swing.JSlider;
import javax.swing.SwingUtilities;

public class RegionNoiseTest
{
    public static void main(String[] args)
    {
        SwingUtilities.invokeLater(() -> createAndShowGUI());
    }

    private static void createAndShowGUI()
    {
        JFrame f = new JFrame();
        f.setDefaultCloseOperation(JFrame.EXIT_ON_CLOSE);
        f.getContentPane().setLayout(new BorderLayout());

        RegionNoiseTestPanel panel = 
            new RegionNoiseTestPanel();
        f.getContentPane().add(panel, BorderLayout.CENTER);

        JSlider nSlider = new JSlider(1, 2000, 1);
        nSlider.addChangeListener(e -> 
        {
            panel.generatePoints(nSlider.getValue());
        });
        nSlider.setValue(100);
        f.getContentPane().add(nSlider, BorderLayout.SOUTH);

        f.setSize(500,450);
        f.setLocationRelativeTo(null);
        f.setVisible(true);
    }

}

class RegionNoiseTestPanel extends JPanel
{
    private final Rectangle2D outer;
    private final Rectangle2D inner;
    private List<Point2D> points;


    RegionNoiseTestPanel()
    {
        outer = new Rectangle2D.Double(50, 50, 400, 300);
        inner = new Rectangle2D.Double(90, 100, 300, 200);
    }

    public void generatePoints(int n)
    {
        this.points = createPoints(n, outer, inner, new Random(0));
        repaint();
    }

    @Override
    protected void paintComponent(Graphics gr)
    {
        super.paintComponent(gr);
        Graphics2D g = (Graphics2D)gr;
        g.setRenderingHint(
            RenderingHints.KEY_ANTIALIASING, 
            RenderingHints.VALUE_ANTIALIAS_ON);

        g.setColor(new Color(220, 220, 220));
        g.fill(outer);
        g.setColor(new Color(160, 160, 160));
        g.fill(inner);

        if (points != null)
        {
            g.setColor(Color.BLUE);
            for (Point2D p : points)
            {
                double r = 2;
                double x = p.getX();
                double y = p.getY();
                g.fill(new Ellipse2D.Double(x - r, y - r, r + r, r + r));
            }
        }


    }

    private static List<Point2D> createPoints(
        int n, Rectangle2D outer, Rectangle2D inner, Random random)
    {
        List<Rectangle2D> regions = computeRegions(outer, inner);
        double cumulativeRegionAreas[] = new double[8];
        double outerArea = outer.getWidth() * outer.getHeight();
        double innerArea = inner.getWidth() * inner.getHeight();
        double relevantArea = outerArea - innerArea;
        double areaSum = 0;
        for (int i = 0; i < regions.size(); i++)
        {
            Rectangle2D region = regions.get(i);
            double area = region.getWidth() * region.getHeight();
            areaSum += area;
            cumulativeRegionAreas[i] = areaSum / relevantArea;
        }

        List<Point2D> points = new ArrayList<Point2D>();
        for (int i=0; i<n; i++)
        {
            points.add(createPoint(
                regions, cumulativeRegionAreas, random));
        }
        return points;
    }

    private static List<Rectangle2D> computeRegions(
        Rectangle2D outer, Rectangle2D inner)
    {
        List<Rectangle2D> regions = new ArrayList<Rectangle2D>();
        for (int r = 0; r < 8; r++)
        {
            regions.add(createRegion(outer, inner, r));
        }
        return regions;
    }

    private static Point2D createPoint(
        List<Rectangle2D> regions, 
        double normalizedCumulativeRegionAreas[], 
        Random random)
    {
        double alpha = random.nextDouble();
        int index = Arrays.binarySearch(normalizedCumulativeRegionAreas, alpha);
        if (index < 0)
        {
            index = -(index + 1);
        }
        Rectangle2D region = regions.get(index);
        double minX = region.getMinX();
        double minY = region.getMinY();
        double maxX = region.getMaxX();
        double maxY = region.getMaxY();
        double x = minX + random.nextDouble() * (maxX - minX);
        double y = minY + random.nextDouble() * (maxY - minY);
        return new Point2D.Double(x, y);
    }

    private static Rectangle2D createRegion(
        Rectangle2D outer, Rectangle2D inner, int region)
    {
        double minX = 0;
        double minY = 0;
        double maxX = 0;
        double maxY = 0;
        switch (region) 
        {
            case 0:
                minX = outer.getMinX();
                minY = outer.getMinY();
                maxX = inner.getMinX();
                maxY = inner.getMinY();
                break;

            case 1:
                minX = inner.getMinX();
                minY = outer.getMinY();
                maxX = inner.getMaxX();
                maxY = inner.getMinY();
                break;

            case 2:
                minX = inner.getMaxX();
                minY = outer.getMinY();
                maxX = outer.getMaxX();
                maxY = inner.getMinY();
                break;

            case 3:
                minX = outer.getMinX();
                minY = inner.getMinY();
                maxX = inner.getMinX();
                maxY = inner.getMaxY();
                break;

            case 4:
                minX = inner.getMaxX();
                minY = inner.getMinY();
                maxX = outer.getMaxX();
                maxY = inner.getMaxY();
                break;

            case 5:
                minX = outer.getMinX();
                minY = inner.getMaxY();
                maxX = inner.getMinX();
                maxY = outer.getMaxY();
                break;

            case 6:
                minX = inner.getMinX();
                minY = inner.getMaxY();
                maxX = inner.getMaxX();
                maxY = outer.getMaxY();
                break;

            case 7:
                minX = inner.getMaxX();
                minY = inner.getMaxY();
                maxX = outer.getMaxX();
                maxY = outer.getMaxY();
                break;
        }
        return new Rectangle2D.Double(minX, minY, maxX - minX, maxY - minY);
    }

}

I'd still be curious whether someone finds an elegant, deterministic approach where it is not necessary to define sorts of "regions" for the points to be generated...

Upvotes: 2

Sampling from a given region

Answers (3)

Related Questions