rnorouzian
rnorouzian

Reputation: 7517

Filling a curve with points that fit under the curve in R plot

I was wondering how I can efficiently (using short R code) fill a curve with points that can fill up the area under my curve?

enter image description here I have tried something without success, here is my R code:

data = rnorm(1000)     ## random data points to fill the curve

curve(dnorm(x), -4, 4) ## curve to be filled by "data" above

points(data)           ## plotting the points to fill the curve

Upvotes: 3

Views: 905

Answers (2)

eipi10
eipi10

Reputation: 93871

Here's a method that uses interpolation to ensure that the plotted points won't exceed the height of the curve (although, if you want the actual point markers to not stick out above the curve, you'll need to set the threshold slightly below the height of the curve):

# Curve to be filled
c.pts = as.data.frame(curve(dnorm(x), -4, 4)) 

# Generate 1000 random points in the same x-interval and with y value between
# zero and the maximum y-value of the curve
set.seed(2)
pts = data.frame(x=runif(1000,-4,4), y=runif(1000,0,max(c.pts$y)))

# Using interpolation, keep only those points whose y-value is less than y(x)
pts = pts[pts$y < approx(c.pts$x,c.pts$y,xout=pts$x)$y, ]

# Plot the points
points(pts, pch=16, col="red", cex=0.7)

enter image description here

A method for plotting exactly a desired number of points under a curve

Responding to @d.b's comment, here's a way to get exactly a desired number of points plotted under a curve:

First, let's figure out how many random points we need to generate over the entire plot region in order to get (roughly) a target number of points under the curve. We do this as follows:

  1. Calculate the area under the curve as a fraction of the area of the rectangle bounded by zero and the maximum height of the curve on the vertical axis, and by the width of the curve on the horizontal axis.
  2. The number of random points we need to generate is the target number of points, divided by the area ratio calculated above.

    # Area ratio
    aa = sum(c.pts$y*median(diff(c.pts$x)))/(diff(c(-4,4))*max(c.pts$y))
    
    # Target number of points under curve
    n.target = 1000
    
    # Number of random points to generate
    n = ceiling(n.target/aa)
    

But we need more points than this to ensure we get at least n.target, because random variation will result in fewer than n.target points about half the time, once we limit the plotted points to those below the curve. So we'll add an excess.factor in order to generate more points under the curve than we need, then we'll just randomly select n.target of those points to plot. Here's a function that takes care of the entire process for a general curve.

# Plot a specified number of points under a curve
pts.under.curve = function(data, n.target=1000, excess.factor=1.5) {

  # Area under curve as fraction of area of plot region
  aa = sum(data$y*median(diff(data$x)))/(diff(range(data$x))*max(data$y))

  # Number of random points to generate
  n = excess.factor*ceiling(n.target/aa)

  # Generate n random points in x-range of the data and with y value between
  # zero and the maximum y-value of the curve
  pts = data.frame(x=runif(n,min(data$x),max(data$x)), y=runif(n,0,max(data$y)))

  # Using interpolation, keep only those points whose y-value is less than y(x)
  pts = pts[pts$y < approx(data$x,data$y,xout=pts$x)$y, ]

  # Randomly select only n.target points
  pts = pts[sample(1:nrow(pts), n.target), ]

  # Plot the points
  points(pts, pch=16, col="red", cex=0.7)

}

Let's run the function for the original curve:

c.pts = as.data.frame(curve(dnorm(x), -4, 4)) 

pts.under.curve(c.pts)

enter image description here

Now let's test it with a different distribution:

# Curve to be filled
c.pts = as.data.frame(curve(df(x, df1=100, df2=20),0,5,n=1001)) 

pts.under.curve(c.pts, n.target=200)

enter image description here

Upvotes: 4

d.b
d.b

Reputation: 32558

n_points = 10000 #A large number

#Store curve in a variable and plot
cc = curve(dnorm(x), -4, 4, n = n_points)

#Generate 1000 random points
p = data.frame(x = seq(-4,4,length.out = n_points), y = rnorm(n = n_points))
#OR p = data.frame(x = runif(n_points,-4,4), y = rnorm(n = n_points))

#Find out the index of values in cc$x closest to p$x
p$ind = findInterval(p$x, cc$x)

#Only retain those points within the curve whose p$y are smaller than cc$y
p2 = p[p$y >= 0 & p$y < cc$y[p$ind],] #may need p[p$y < 0.90 * cc$y[p$ind],] or something

#Plot points
points(p2$x, p2$y)

Upvotes: 1

Related Questions