num8ers
num8ers

Reputation: 63

Coefficient of Divergence - how to code loop more efficiently?

I have a matrix of 1500 rows and 20 columns and would like to calculate the Coefficient of Divergence between columns (ie. sites). Each row is my hourly observation per site.

Calculation for COD between two sites is as stated in https://www.scribd.com/document/364585909/Coefficient-of-Divergence-COD

I have created a simpler dataset to work with:

x1 <- c(10, 20, 30, 40, 50)
x2 <- c(11, 21, 31, 41, 51)
x3 <- c(12, 22, 32, 42, 52)
x4 <- c(13, 23, 33, 43, 53)
x5 <- c(14, 24, 34, 44, 54)
x6 <- c(15, 25, 35, 44, 55)

xmat <- cbind(x1,x2,x3,x4,x5,x6)

first COD

site1 <- xmat

for (i in 2:ncol(site1))   

     site1[,i] = (xmat[,i]-xmat[,1])/(xmat[,i]+xmat[,1])
     site1 <- site1[,2:6]
     site1 <- sapply(site1, function(x) x^2)
     COD-site1 <- data.frame(sqrt(colMeans(site1, na.rm=TRUE)))

second COD

site2 <- xmat

for (i in 3:ncol(site2))

    site2[,i] = (xmat[,i]-xmat[,2])/(xmat[,i]+xmat[,2])
    site2 <- site2[,3:6]
    site2 <- sapply(site2, function(x) x^2)
    COD-site2 <- data.frame(sqrt(colMeans(site2, na.rm=TRUE)))

etc....

I'm having a problem on how to loop through columns and create a new data frame for CODs. In my simple dataset, it's over 6 columns but in my actual dataset, I have 20 columns and don't really want to copy and paste 20 times. I'm a novice in writing function and loop for R and would appreciate a lending hand. Many thanks

Upvotes: 1

Views: 435

Answers (1)

Parfait
Parfait

Reputation: 107652

With matrix objects, elemetwise operations of same sizes can be handled without loops such as for and sapply where you can run arithmetic directly on whole objects in vectorized processes:

Site 1

site1 <- xmat
site1[,2:ncol(site1)] <- (xmat[,2:ncol(site1)]-xmat[,1])/(xmat[,2:ncol(site1)]+xmat[,1])
site1 <- site1[,2:6]^2
COD_site1 <- data.frame(result=sqrt(colMeans(site1, na.rm=TRUE)))
COD_site1

Site 2

site2 <- xmat
site2[,3:ncol(site2)] <- (xmat[,3:ncol(site2)]-xmat[,2])/(xmat[,3:ncol(site2)]+xmat[,2])
site2 <- site2[,3:6]^2
COD_site2 <- data.frame(result=sqrt(colMeans(site2, na.rm=TRUE)))
COD_site2

To iterate across, generalize your processing and run in a loop like lapply to build a list of return values. Have the loop iterate across the number of columns of original matrix minus first item (i.e., 2, 3, 4, 5, 6):

result_list <- lapply(seq(ncol(xmat))[-1], function(i) {

  tryCatch({
     site <- xmat
     site[, i:ncol(site)] <- (xmat[,i:ncol(site)]-xmat[,i-1])/(xmat[,i:ncol(site)]+xmat[,i-1])
     site <- as.matrix(site[,i:ncol(xmat)]^2)
     COD_site <- data.frame(result=sqrt(colMeans(site, na.rm=TRUE)))

     # RETURN LIST OF BOTH ITEMS
     return(list(site, COD_site))

   }, error = function(e) list(NA, NA)
  )

})

Even name list elements according to site:

names(df_list) <- paste0("COD_site", seq(ncol(xmat)-1))

Even extract lists into their own types:

mat_lists <- lapply(result_list, `[[`, 1)

df_lists <- lapply(result_list, `[[`, 2)

mat_lists

mat_lists
# $COD_site1
#                x2           x3           x4          x5          x6
# [1,] 0.0022675737 0.0082644628 0.0170132325 0.027777778 0.040000000
# [2,] 0.0005948840 0.0022675737 0.0048674959 0.008264463 0.012345679
# [3,] 0.0002687450 0.0010405827 0.0022675737 0.003906250 0.005917160
# [4,] 0.0001524158 0.0005948840 0.0013064305 0.002267574 0.002267574
# [5,] 0.0000980296 0.0003844675 0.0008483363 0.001479290 0.002267574

# $COD_site2
#                x3           x4           x5           x6
# [1,] 8.218953e-05 5.787037e-04 1.728000e-03 3.641329e-03
# [2,] 1.257751e-05 9.391435e-05 2.962963e-04 6.575162e-04
# [3,] 3.999248e-06 3.051758e-05 9.831589e-05 2.226118e-04
# [4,] 1.748903e-06 1.349746e-05 4.396499e-05 4.396499e-05
# [5,] 9.151417e-07 7.111971e-06 2.332362e-05 5.373563e-05

# $COD_site3
#                x4           x5           x6
# [1,] 2.560000e-06 3.501278e-05 1.524158e-04
# [2,] 2.438653e-07 3.573458e-06 1.659945e-05
# [3,] 5.602045e-08 8.432265e-07 4.019627e-06
# [4,] 1.915686e-08 2.925002e-07 2.925002e-07
# [5,] 8.227025e-09 1.267350e-07 6.179451e-07

# $COD_site4
#                x5           x6
# [1,] 6.969172e-08 1.859344e-06
# [2,] 4.360243e-09 1.255867e-07
# [3,] 7.406721e-10 2.200926e-08
# [4,] 2.006336e-10 2.006336e-10
# [5,] 7.129862e-11 2.177866e-09

# $COD_site5
#              [,1]
# [1,] 1.681171e-09
# [2,] 7.224762e-11
# [3,] 9.266281e-12
# [4,] 0.000000e+00
# [5,] 5.962673e-13

df_lists

df_lists
# $COD_site1
#        result
# x2 0.02600634
# x3 0.05010383
# x4 0.07253009
# x5 0.09348300
# x6 0.11206961

# $COD_site2
#         result
# x3 0.004504006
# x4 0.012031168
# x5 0.020927975
# x6 0.030394597

# $COD_site3
#          result
# x4 0.0007599039
# x5 0.0028230728
# x6 0.0058982253

# $COD_site4
#          result
# x5 0.0001225272
# x6 0.0006339273

# $COD_site5
#         result
# 1 1.877915e-05

Upvotes: 1

Related Questions