Find areas of polygons: integration under vertical and horizontal geometric constraints

Question

I'm trying to calculate areas that are above zero and under a curve. My curve has discrete x and y values that look like the example below.

y <- c(-1, 5, 2, 3.5, 1, 5.5, -2, 3, -1)
plot(1:length(y), y, type = "l")
abline(h = 0)

I'm trying to calculate the areas subject to vertical and horizontal geometric constraints:

(vertical) under the curve but above zero;
(horizontal) between two adjacent local minima.

That is, I need the areas of polygons A, B, C and D in the image below.

I'm struggling with two things now:

I identified the position index of local minima using: which(diff(sign(diff(y))) == 2) + 1, but this didn't give me the upper x value for C or lower x value for D. How to get those points where the curve intersects zeros?
I think if I can get 1) the correct list of local minima above zero, 2) those crossover points at zero, 3) the correct list of local maxima above zero, I know all the boundary points of A, B, C and D, so computing their areas would be possible. But this does not seem straightforward to code in R. Is this really the easiest way to solve my issue, or are there better methods?

Zheyuan Li · Accepted Answer

## (x, y) data
y <- c(-1, 5, 2, 3.5, 1, 5.5, -2, 3, -1)
x <- 1:length(y)

Analytical method

Your desired computation can be done in two steps:

piece-wise integral on each linear segment. There is no difficulty if you only want to integrate the above-zero proportion (see details below);
aggregate piece-wise results appropriately for areas of A, B, C and D (see details below).

step 1: piece-wise integral for above-zero proportion

If there are n (x, y) data, there will be (n - 1) segments. Denote (xl, yl) as the left point of the segment, and (xr, yr) the right point.

if (yl < 0) && (yr < 0), the whole segment is below zero;
if (yl > 0) && (yr > 0), the whole segment is above zero;
if (yl < 0) && (yr > 0), the segment is increasing, crossing zero;
if (yl > 0) && (yr < 0), the segment is decreasing, crossing zero.

In cases 3 and 4, denote (xm, 0) to be the crossover point. xm is easy to determine. The equation for the line segment is

y = yl + (yr - yl) * (x - xl) / (xr - xl)

By setting y to 0 you get

xm = xl - yl * (xr - xl) / (yr - yl)

Since you want integrating the above-zero proportion of each segment, we have for each case:

integral is 0;
the area is a trapezoid and the integral is (yl + yr) * (xr - xl) / 2;
the area is a triangle between (xm, 0) and (xr, yr); the integral is yr * (xr - xm) / 2;
the area is a triangle between (xl, yl) and (xm, 0); the integral is yl * (xm - xl) / 2.

Since you eventually want to apply the computation to long vectors, I would present the computations in an Rcpp function.

library(Rcpp)

cppFunction('NumericVector foo_cpp (NumericVector x, NumericVector y) {
  int n_segments = x.size() - 1;
  NumericVector integral(n_segments);
  double xl, xr, yl, yr, xm; int i;
  for (i = 0; i < n_segments; i++) {
    xl = x[i]; xr = x[i + 1];
    yl = y[i]; yr = y[i + 1];
    if (yl < 0 && yr < 0) integral[i] = 0.0;
    if (yl > 0 && yr > 0) integral[i] = 0.5 * (yl + yr) * (xr - xl);
    if (yl < 0 && yr > 0) {
      xm = xl - yl * (xr - xl) / (yr - yl);
      integral[i] = 0.5 * yr * (xr - xm);
      }
    if (yl > 0 && yr < 0) {
      xm = xl - yl * (xr - xl) / (yr - yl);
      integral[i] = 0.5 * yl * (xm - xl);
      }
    }
  return integral;
  }')

z <- foo_cpp(x, y)
#[1] 2.083333 3.500000 2.750000 2.250000 3.250000 2.016667 0.900000 1.125000

I am lazy to do further code optimization. Its speed is sufficiently good for your practical use.

step 2: aggregation

You actually cut segments into chunks by local minima, and aim to compute integral on each chunk.

The position index for local minima are (as you worked out in your question):

which(diff(sign(diff(y))) == 2) + 1
#[1] 3 5 7

This means that the segments should be split by break points:

b <- which(diff(sign(diff(y))) == 2)
#[1] 2 4 6

That is,

## number of segments per chunk
n_chunks <- length(x) - 1
n_segments_per_chunk <- diff(c(0, b, n_chunks))
#[1] 2 2 2 2

## grouping index for each chunk
grp <- rep.int(seq_along(n_segments_per_chunk), n_segments_per_chunk)
#[1] 1 1 2 2 3 3 4 4

So areas of A, B, C and D are:

sapply(split(z, grp), sum)
#       1        2        3        4 
#5.583333 5.000000 5.266667 2.025000

Numerical method

## original linear interpolation function
f <- approxfun(x, y)

## a function zeroing out below-zero part of `f`
g <- function (x) {
  fx <- f(x)
  ifelse(fx > 0, fx, 0)
  }

## local minima
x_minima <- x[which(diff(sign(diff(y))) == 2) + 1]

## break points for numerical integration
xx <- c(x[1], x_minima, x[length(x)])

## integration will happen on:
# cbind(xx[-length(xx)], xx[-1])
#     [,1] [,2]
#[1,]    1    3  ## A
#[2,]    3    5  ## B
#[3,]    5    7  ## C
#[4,]    7    9  ## D

## use `mapply`
mapply(function (lwr, upr) integrate(g, lower = lwr, upper = upr)$value,
       xx[-length(xx)], xx[-1])
#[1] 5.583333 5.000000 5.266679 2.025000

Find areas of polygons: integration under vertical and horizontal geometric constraints

Answers (1)

Analytical method

Numerical method

Related Questions