Filling a curve with points that fit under the curve in R plot

Question

I was wondering how I can efficiently (using short R code) fill a curve with points that can fill up the area under my curve?

I have tried something without success, here is my R code:

data = rnorm(1000)     ## random data points to fill the curve

curve(dnorm(x), -4, 4) ## curve to be filled by "data" above

points(data)           ## plotting the points to fill the curve

eipi10 · Accepted Answer

Here's a method that uses interpolation to ensure that the plotted points won't exceed the height of the curve (although, if you want the actual point markers to not stick out above the curve, you'll need to set the threshold slightly below the height of the curve):

# Curve to be filled
c.pts = as.data.frame(curve(dnorm(x), -4, 4)) 

# Generate 1000 random points in the same x-interval and with y value between
# zero and the maximum y-value of the curve
set.seed(2)
pts = data.frame(x=runif(1000,-4,4), y=runif(1000,0,max(c.pts$y)))

# Using interpolation, keep only those points whose y-value is less than y(x)
pts = pts[pts$y < approx(c.pts$x,c.pts$y,xout=pts$x)$y, ]

# Plot the points
points(pts, pch=16, col="red", cex=0.7)

A method for plotting exactly a desired number of points under a curve

Responding to @d.b's comment, here's a way to get exactly a desired number of points plotted under a curve:

First, let's figure out how many random points we need to generate over the entire plot region in order to get (roughly) a target number of points under the curve. We do this as follows:

Calculate the area under the curve as a fraction of the area of the rectangle bounded by zero and the maximum height of the curve on the vertical axis, and by the width of the curve on the horizontal axis.

The number of random points we need to generate is the target number of points, divided by the area ratio calculated above.

# Area ratio
aa = sum(c.pts$y*median(diff(c.pts$x)))/(diff(c(-4,4))*max(c.pts$y))

# Target number of points under curve
n.target = 1000

# Number of random points to generate
n = ceiling(n.target/aa)

But we need more points than this to ensure we get at least n.target, because random variation will result in fewer than n.target points about half the time, once we limit the plotted points to those below the curve. So we'll add an excess.factor in order to generate more points under the curve than we need, then we'll just randomly select n.target of those points to plot. Here's a function that takes care of the entire process for a general curve.

# Plot a specified number of points under a curve
pts.under.curve = function(data, n.target=1000, excess.factor=1.5) {

  # Area under curve as fraction of area of plot region
  aa = sum(data$y*median(diff(data$x)))/(diff(range(data$x))*max(data$y))

  # Number of random points to generate
  n = excess.factor*ceiling(n.target/aa)

  # Generate n random points in x-range of the data and with y value between
  # zero and the maximum y-value of the curve
  pts = data.frame(x=runif(n,min(data$x),max(data$x)), y=runif(n,0,max(data$y)))

  # Using interpolation, keep only those points whose y-value is less than y(x)
  pts = pts[pts$y < approx(data$x,data$y,xout=pts$x)$y, ]

  # Randomly select only n.target points
  pts = pts[sample(1:nrow(pts), n.target), ]

  # Plot the points
  points(pts, pch=16, col="red", cex=0.7)

}

Let's run the function for the original curve:

c.pts = as.data.frame(curve(dnorm(x), -4, 4)) 

pts.under.curve(c.pts)

Now let's test it with a different distribution:

# Curve to be filled
c.pts = as.data.frame(curve(df(x, df1=100, df2=20),0,5,n=1001)) 

pts.under.curve(c.pts, n.target=200)

Filling a curve with points that fit under the curve in R plot

Answers (2)

A method for plotting exactly a desired number of points under a curve

Related Questions