Reputation: 31
I have a bunch of datasets of x’s and y’s. For each dataset, I plot points (x, y) in R. And the resulting plots are generally similar to either type A or type B. Type B has an intersection while type A doesn’t have.
My question: Given a new dataset, is it possible to calculate (in R) the red shaded area under the curve as indicated in type A and type B plot without knowing the visualization?
The main challenges are:
1) How to determine whether the dataset will generate type A or type B in R?
2) How to compute the red shaded area in type B using the dataset with R?
Here is the code producing the dataset that generated type B curve.
set.seed(300)
predicted_value_A = c(rbeta(300, 9, 2), rbeta(700, 2, 4), rbeta(10000, 2, 4))
predicted_value_B = c(rbeta(1000, 4, 3), rbeta(10000, 2, 3))
real_value = c(rep(1, 1000), rep(0, 10000))
library(ROCR)
library(ggplot2)
predB <- prediction(predicted_value_B, real_value)
perfB <- performance(predB, measure = "mat", x.measure = "f")
yB <- attr(perfB, "y.values")[[1]]
yB <- (yB + 1)/2
xB <- attr(perfB, "x.values")[[1]]
# dataset that generates type B curve
dfB <- data.frame(X = xB, Y= yB)
ggplot(df, aes(x=X, y=Y, ymin=0, ymax=1, xmin=0, xmax=1 )) + geom_point(size = 0.2, shape = 21, fill="white")+
ggtitle("Type B curve") +
theme(plot.title=element_text(hjust=0.5))
Upvotes: 3
Views: 125
Reputation: 277
Here is a bit of code to shade the plot from a set of (x,y) points using an approximation with small rectangles. This assumes evenly spaced x values, and enough that the rectangular approximation works well.
# sample dataset
x <- seq(0,2,length.out=1000)
y1 <- x
y2 <- sin(x*pi)+x
# plot
plot(x,y1,type='l',ylab='y')
lines(x,y2)
# shade the plot
## not efficient but works
dx <- x[2]-x[1]
area <- 0
# shade plot and calculate area
## uses a rectangular strip approximation
## assumes even spacing in x. Could also calculate the dx in each step if it changes
for (i in 1:(length(x))) {
if (y1[i] < y2[i]) {
cord.x <- c(x[i]-dx/2,x[i]-dx/2,x[i]+dx/2,x[i]+dx/2)
cord.y <- c(y1[i],y2[i],y2[i],y1[i])
} else {
cord.x <- c(x[i]-dx/2,x[i]-dx/2,x[i]+dx/2,x[i]+dx/2)
cord.y <- c(y2[i],y1[i],y1[i],y2[i])
}
# draw the polygons
polygon(cord.x, cord.y, col = 'pink', border = NA)
# sum to the area
area <- area + abs(y2[i]-y1[i])*dx
}
area
sample shaded plot by rectangular approximation
Upvotes: 0