sermomon
sermomon

Reputation: 317

Area Under the Curve using Simpson's rule in R

I would like to compute the Area Under the Curve defined by a set of experimental values. I created a function to calculate an aproximation of the AUC using the Simpson's rule as I saw in this post. However, the function only works when it receives a vector of odd length. How can I modify the code to add the area of the last trapezoid when the input vector has an even length.

AUC <- function(x, h=1){
  # AUC function computes the Area Under the Curve of a time serie using
  # the Simpson's Rule (numerical method).
  # https://link.springer.com/chapter/10.1007/978-1-4612-4974-0_26
  
  # Arguments
  # x: (vector) time serie values
  # h: (int) temporal resolution of the time serie. default h=1
  
  n = length(x)-1
  
  xValues = seq(from=1, to=n, by=2)
  
  sum <- list()
  for(i in 1:length(xValues)){
    n_sub <- xValues[[i]]-1
    n <- xValues[[i]]
    n_add <- xValues[[i]]+1
    
    v1 <- x[[n_sub+1]]
    v2 <- x[[n+1]]
    v3 <- x[[n_add+1]]
    
    s <- (h/3)*(v1+4*v2+v3)
    sum <- append(sum, s)
  }
  sum <- unlist(sum)
  
  auc <- sum(sum)
  
  return(auc)
  
}

Here a data example:

smoothed = c(0.3,0.317,0.379,0.452,0.519,0.573,0.61,0.629,0.628,0.613,0.587,0.556,0.521,
             0.485,0.448,0.411,0.363,0.317,0.273,0.227,0.185,0.148,0.12,0.103,0.093,0.086,
             0.082,0.079,0.076,0.071,0.066,0.059,0.053,0.051,0.052,0.057,0.067,0.081,0.103,
             0.129,0.165,0.209,0.252,0.292,0.328,0.363,0.398,0.431,0.459,0.479,0.491,0.494,
             0.488,0.475,0.457,0.43,0.397,0.357,0.316,0.285,0.254,0.227,0.206,0.189,0.181,
             0.171,0.157,0.151,0.162,0.192,0.239)

Upvotes: 3

Views: 697

Answers (3)

Jeffrey Harkness
Jeffrey Harkness

Reputation: 96

Here, I show example code that uses the Simpson's 1/3 and 3/8 rules in tandem for the numerical integration of data. As always, the usual caveats about the possibility of coding errors or compatibility issues apply.

The output at the end compares the numerical estimates of this algorithm with the trapezoidal rule using R's "integrate" function.

#Algorithm adapted from:
#Numerical Methods for Engineers, Seventh Edition, 
#By Chapra and Canale, page 623
#Modified to accept data instead of functional values
#Modified by: Jeffrey Harkness, M.S.

##Begin Simpson's rule function code

simp13 <- function(dat, h = 1){
ans = 2*h*(dat[1] + 4*dat[2] + dat[3])/6
return(ans)}

simp13m <- function(dat, h = 1){
summ <- dat[1]
n <- length(dat)
nseq <- seq(2,(n-2),2)
for(i in nseq){
summ <- summ + 4*dat[i] + 2*dat[i+1]}
summ <- summ + 4*dat[n-1] + dat[n]
result <- (h*summ)/3
return(result)}

simp38 <- function(dat, h = 1){
ans <-  3*h*(dat[1] + 3*sum(dat[2:3]) + dat[4])/8
return(ans)}

simpson = function(dat, h = 1){
hin = h
len = length(dat)
comp <- len %% 2
##number of segments
if(len == 2){
ans = sum(dat)/2*h} ##n = 2 is the trapezoidal rule
if(len == 3){
ans = simp13(dat, h = hin)}
if(len == 4){
ans = simp38(dat,h = hin)}
if(len == 6){
ans <- simp38(dat[1:4],h = hin) + simp13(dat[4:len],h = hin)}
if(len > 6 & comp == 0){
ans = simp38(dat[1:4],h = hin) + simp13m(dat[4:len],h = hin)}
if(len >= 5 & comp == 1){
ans = simp13m(dat,h = hin)}
return(ans)}

##End Simpson's rule function code

This next section of code shows the performance comparison. This code can easily be altered for different test functions and cases.

The precision difference tends to change with the sample size and test function used; this example is not intended to imply that the difference is always this pronounced.

#other algorithm for comparison purposes, from Allan Cameron above
oa <- function(x, h = 1) sum((x[-1] + x[-length(x)]) / 2 * h)
 
#Testing and algorithm comparison code
simans = NULL; oaans = NULL; simerr = NULL; oaerr = NULL; mp = NULL
for( j in 1:10){
n = j
#f = function(x) cos(x) + 2   ##Test functions
f = function(x) 0.2 + 25*x - 200*x^2 + 675*x^3 - 900*x^4 + 400*x^5
a = 0;b = 10
h = (b-a)/n
datain = seq(a,b,by = h)
 
preans = integrate(f,a,b)$value #precise numerical estimate of test function
simans[j] = simpson(f(datain), h = h)
oaans[j] = oa(f(datain), h = h)
(simerr[j] = abs(simans[j] - preans)/preans * 100)
(oaerr[j] = abs(oaans[j] - preans)/preans * 100)
 
mp[j] = simerr[j] < oaerr[j]
}
    (outframe = data.frame("simpsons percent diff" = simerr,"trapezoidal percent diff" = oaerr, "more precise?" = mp, check.names = F))
   simpsons percent diff trapezoidal percent diff more precise?
1           214.73489738               214.734897         FALSE
2            15.07958148                64.993410          TRUE
3             6.70203621                29.816799          TRUE
4             0.94247384                16.955208          TRUE
5             0.54830021                10.905620          TRUE
6             0.18616767                 7.593825          TRUE
7             0.12051767                 5.588209          TRUE
8             0.05890462                 4.282980          TRUE
9             0.04087107                 3.386525          TRUE
10            0.02412733                 2.744500          TRUE

Upvotes: 1

Allan Cameron
Allan Cameron

Reputation: 173793

There may be a good reason for you to prefer using Simpson's rule, but if you're just looking for a quick and efficient estimate of AUC, the trapezoid rule is far easier to implement, and does not require an even number of breaks:

AUC <- function(x, h = 1) sum((x[-1] + x[-length(x)]) / 2 * h)

AUC(smoothed)
#> [1] 20.3945

Upvotes: 3

Jeffrey Harkness
Jeffrey Harkness

Reputation: 96

One recommended way to handle an even number of points and still achieve precision is to combine Simpson's 1/3 rule with Simpson's 3/8 rule, which can handle an even number of points. Such approaches can be found in (at least one or perhaps more) engineering textbooks on numerical methods.

However, as a practical matter, you can write a code chunk to check the data length and add a single trapezoid at the end, as was suggested in the last comment of the post to which you linked. I wouldn't assume that it is necessarily as precise as combining Simpson's 1/3 and 3/8 rules, but it is probably reasonable for many applications.

I would double-check my code edits below, but this is the basic idea.

AUC <- function(x, h=1){
  # AUC function computes the Area Under the Curve of a time serie using
  # the Simpson's Rule (numerical method).
  # https://link.springer.com/chapter/10.1007/978-1-4612-4974-0_26
  
  # Arguments
  # x: (vector) time serie values
  # h: (int) temporal resolution of the time serie. default h=1
  
  #jh edit: check for even data length
  #and chop off last data point if even
  nn = length(x)
  if(length(x) %% 2 == 0){
  xlast = x[length(x)]
  x = x[-length(x)]
  }

  n = length(x)-1
  
  xValues = seq(from=1, to=n, by=2)
  
  sum <- list()
  for(i in 1:length(xValues)){
    n_sub <- xValues[[i]]-1
    n <- xValues[[i]]
    n_add <- xValues[[i]]+1
    
    v1 <- x[[n_sub+1]]
    v2 <- x[[n+1]]
    v3 <- x[[n_add+1]]
    
    s <- (h/3)*(v1+4*v2+v3)
    sum <- append(sum, s)
  }
  sum <- unlist(sum)
  
  auc <- sum(sum)
  ##jh edit: add trapezoid for last two data points to result
  if(nn %% 2 == 0){
  auc <- auc + (x[length(x)] + xlast)/2 * h
  }
  
  
  return(auc)
  
}


sm = smoothed[-length(smoothed)]
length(sm)
[1] 70
#even data as an example
AUC(sm)
[1] 20.17633
#original odd data
AUC(smoothed)
[1] 20.389

Upvotes: 2

Related Questions