Reputation: 165
I'm trying to calculate areas that are above zero and under a curve. My curve has discrete x
and y
values that look like the example below.
y <- c(-1, 5, 2, 3.5, 1, 5.5, -2, 3, -1)
plot(1:length(y), y, type = "l")
abline(h = 0)
I'm trying to calculate the areas subject to vertical and horizontal geometric constraints:
That is, I need the areas of polygons A, B, C and D in the image below.
I'm struggling with two things now:
I identified the position index of local minima using: which(diff(sign(diff(y))) == 2) + 1
, but this didn't give me the upper x
value for C or lower x
value for D. How to get those points where the curve intersects zeros?
I think if I can get 1) the correct list of local minima above zero, 2) those crossover points at zero, 3) the correct list of local maxima above zero, I know all the boundary points of A, B, C and D, so computing their areas would be possible. But this does not seem straightforward to code in R. Is this really the easiest way to solve my issue, or are there better methods?
Upvotes: 2
Views: 317
Reputation: 73315
## (x, y) data
y <- c(-1, 5, 2, 3.5, 1, 5.5, -2, 3, -1)
x <- 1:length(y)
Your desired computation can be done in two steps:
step 1: piece-wise integral for above-zero proportion
If there are n
(x, y) data, there will be (n - 1)
segments. Denote (xl, yl)
as the left point of the segment, and (xr, yr)
the right point.
(yl < 0) && (yr < 0)
, the whole segment is below zero;(yl > 0) && (yr > 0)
, the whole segment is above zero;(yl < 0) && (yr > 0)
, the segment is increasing, crossing zero;(yl > 0) && (yr < 0)
, the segment is decreasing, crossing zero.In cases 3 and 4, denote (xm, 0)
to be the crossover point. xm
is easy to determine. The equation for the line segment is
y = yl + (yr - yl) * (x - xl) / (xr - xl)
By setting y
to 0
you get
xm = xl - yl * (xr - xl) / (yr - yl)
Since you want integrating the above-zero proportion of each segment, we have for each case:
(yl + yr) * (xr - xl) / 2
;(xm, 0)
and (xr, yr)
; the integral is yr * (xr - xm) / 2
;(xl, yl)
and (xm, 0)
; the integral is yl * (xm - xl) / 2
.Since you eventually want to apply the computation to long vectors, I would present the computations in an Rcpp function.
library(Rcpp)
cppFunction('NumericVector foo_cpp (NumericVector x, NumericVector y) {
int n_segments = x.size() - 1;
NumericVector integral(n_segments);
double xl, xr, yl, yr, xm; int i;
for (i = 0; i < n_segments; i++) {
xl = x[i]; xr = x[i + 1];
yl = y[i]; yr = y[i + 1];
if (yl < 0 && yr < 0) integral[i] = 0.0;
if (yl > 0 && yr > 0) integral[i] = 0.5 * (yl + yr) * (xr - xl);
if (yl < 0 && yr > 0) {
xm = xl - yl * (xr - xl) / (yr - yl);
integral[i] = 0.5 * yr * (xr - xm);
}
if (yl > 0 && yr < 0) {
xm = xl - yl * (xr - xl) / (yr - yl);
integral[i] = 0.5 * yl * (xm - xl);
}
}
return integral;
}')
z <- foo_cpp(x, y)
#[1] 2.083333 3.500000 2.750000 2.250000 3.250000 2.016667 0.900000 1.125000
I am lazy to do further code optimization. Its speed is sufficiently good for your practical use.
step 2: aggregation
You actually cut segments into chunks by local minima, and aim to compute integral on each chunk.
The position index for local minima are (as you worked out in your question):
which(diff(sign(diff(y))) == 2) + 1
#[1] 3 5 7
This means that the segments should be split by break points:
b <- which(diff(sign(diff(y))) == 2)
#[1] 2 4 6
That is,
## number of segments per chunk
n_chunks <- length(x) - 1
n_segments_per_chunk <- diff(c(0, b, n_chunks))
#[1] 2 2 2 2
## grouping index for each chunk
grp <- rep.int(seq_along(n_segments_per_chunk), n_segments_per_chunk)
#[1] 1 1 2 2 3 3 4 4
So areas of A, B, C and D are:
sapply(split(z, grp), sum)
# 1 2 3 4
#5.583333 5.000000 5.266667 2.025000
## original linear interpolation function
f <- approxfun(x, y)
## a function zeroing out below-zero part of `f`
g <- function (x) {
fx <- f(x)
ifelse(fx > 0, fx, 0)
}
## local minima
x_minima <- x[which(diff(sign(diff(y))) == 2) + 1]
## break points for numerical integration
xx <- c(x[1], x_minima, x[length(x)])
## integration will happen on:
# cbind(xx[-length(xx)], xx[-1])
# [,1] [,2]
#[1,] 1 3 ## A
#[2,] 3 5 ## B
#[3,] 5 7 ## C
#[4,] 7 9 ## D
## use `mapply`
mapply(function (lwr, upr) integrate(g, lower = lwr, upper = upr)$value,
xx[-length(xx)], xx[-1])
#[1] 5.583333 5.000000 5.266679 2.025000
Upvotes: 2