Kirin
Kirin

Reputation: 31

How to compute area under the curve (intersection may occur) with the original dataset?

I have a bunch of datasets of x’s and y’s. For each dataset, I plot points (x, y) in R. And the resulting plots are generally similar to either type A or type B. Type B has an intersection while type A doesn’t have.

My question: Given a new dataset, is it possible to calculate (in R) the red shaded area under the curve as indicated in type A and type B plot without knowing the visualization?

The main challenges are:

1) How to determine whether the dataset will generate type A or type B in R?

2) How to compute the red shaded area in type B using the dataset with R?

Here is the code producing the dataset that generated type B curve.

set.seed(300)
predicted_value_A = c(rbeta(300, 9, 2), rbeta(700, 2, 4), rbeta(10000, 2, 4))
predicted_value_B = c(rbeta(1000, 4, 3), rbeta(10000, 2, 3))
real_value = c(rep(1, 1000), rep(0, 10000))

library(ROCR)
library(ggplot2)

predB <- prediction(predicted_value_B, real_value)
perfB <- performance(predB, measure = "mat", x.measure = "f")

yB <- attr(perfB, "y.values")[[1]]

yB <- (yB + 1)/2

xB <- attr(perfB, "x.values")[[1]]  

# dataset that generates type B curve
dfB <- data.frame(X = xB, Y= yB)

ggplot(df, aes(x=X, y=Y, ymin=0, ymax=1, xmin=0, xmax=1 )) + geom_point(size =     0.2, shape = 21, fill="white")+
ggtitle("Type B curve") + 
theme(plot.title=element_text(hjust=0.5))

enter image description here enter image description here

Upvotes: 3

Views: 125

Answers (1)

Rob
Rob

Reputation: 277

Here is a bit of code to shade the plot from a set of (x,y) points using an approximation with small rectangles. This assumes evenly spaced x values, and enough that the rectangular approximation works well.

# sample dataset
x <- seq(0,2,length.out=1000)
y1 <- x
y2 <- sin(x*pi)+x

# plot
plot(x,y1,type='l',ylab='y')
lines(x,y2)

# shade the plot
## not efficient but works
dx <- x[2]-x[1]
area <- 0

# shade plot and calculate area
## uses a rectangular strip approximation
## assumes even spacing in x. Could also calculate the dx in each step if it changes
for (i in 1:(length(x))) {

  if (y1[i] < y2[i]) {
    cord.x <- c(x[i]-dx/2,x[i]-dx/2,x[i]+dx/2,x[i]+dx/2)
    cord.y <- c(y1[i],y2[i],y2[i],y1[i])
  } else {
    cord.x <- c(x[i]-dx/2,x[i]-dx/2,x[i]+dx/2,x[i]+dx/2)
    cord.y <- c(y2[i],y1[i],y1[i],y2[i])
  }

  # draw the polygons
  polygon(cord.x, cord.y, col = 'pink', border = NA)

  # sum to the area
  area <- area + abs(y2[i]-y1[i])*dx
}
area

sample shaded plot by rectangular approximation

Upvotes: 0

Related Questions