Reputation: 85

Avoiding a loop when populating data frames in R

I have an empty data frame T_modelled with 2784 columns and 150 rows.

T_modelled <- data.frame(matrix(ncol = 2784, nrow = 150))
names(T_modelled) <- paste0("t=", t_sec_ERT)
rownames(T_modelled) <- paste0("z=", seq(from = 0.1, to = 15, by = 0.1))

where

t_sec_ERT <- seq(from = -23349600, to = 6706800, by = 10800)
z <- seq(from = 0.1, to = 15, by = 0.1)

I filled T_modelled by column with a nested for loop, based on a formula:

for (i in 1:ncol(T_modelled)) {
  col_tmp <- colnames(T_modelled)[i]
  for (j in 1:nrow(T_modelled)) {
    z_tmp <- z[j]-0.1
    T_tmp <- MANSRT+As*e^(-z_tmp*(omega/(2*K))^0.5)*sin(omega*t_sec_ERT[i]-((omega/(2*K))^0.5)*z_tmp)
    T_modelled[j ,col_tmp] <- T_tmp
  }
}

where

MANSRT <- -2.051185
As <- 11.59375
omega <- (2*pi)/(347.875*24*60*60)
c <- 790
k <- 0.00219
pb <- 2600
K <- (k*1000)/(c*pb)
e <- exp(1)

I do get the desired results but I keep thinking there must be a more efficient way of filling that data frame. The loop is quite slow and looks cumbersome to me. I guess there is an opportunity to take advantage of R's vectorized way of calculating. I just cannot see myself how to incorporate the formula in an easier way to fill T_modelled.

Anyone got any ideas how to get the same result in a faster, more "R-like" manner?

Upvotes: 2

Answers (4)

Parfait

Reputation: 107767

Much like your previous question's solution which you accepted, consider simply using sapply, iterating through the vector, t_sec_ERT, which is the same length as your desired dataframe's number of columns. But first adjust every element of z by 0.1. Plus, there's no need to create empty dataframe beforehand.

z_adj <- z - 0.1

T_modelled2 <- data.frame(sapply(t_sec_ERT, function(ert)
        MANSRT+As*e^(-z_adj*(omega/(2*K))^0.5)*sin(omega*ert-((omega/(2*K))^0.5)*z_adj)))

colnames(T_modelled2) <- paste0("t=", t_sec_ERT)
rownames(T_modelled2) <- paste0("z=", z)

all.equal(T_modelled, T_modelled2)
# [1] TRUE

Upvotes: 2

AkselA

Reputation: 8856

Rui is of course correct, I just want to suggest a way of reasoning when writing a loop like this.

You have two numeric vectors. Functions for numerics in R are usually vectorized. By which I mean you can do stuff like this

x <- c(1, 6, 3)
sum(x)

not needing something like this

x_ <- 0
for (i in x) {
    x_ <- i + x_ 
}
x_

That is, no need for looping in R. Of course looping takes place none the less, it just happens in the underlying C, Fortran etc. code, where it can be done more efficiently. This is usually what we mean when we call a function vectorized: looping takes place "under the hood" as it were. The output of Vectorize() thus isn't strictly vectorized by this definition.

When you have two numeric vectors you want to loop over you have to first see if the constituent functions are vectorized, usually by reading the docs.

If it is, you continue by constructing that central vectorized compound function and and start testing it with one vector and one scalar. In your case it would be something like this (testing with just the first element of t_sec_ERT).

z_tmp <- z - 0.1
i <- 1

T_tmp <- MANSRT + As * 
         exp(-z_tmp*(omega/(2*K))^0.5) * 
         sin(omega*t_sec_ERT[i] - ((omega/(2*K))^0.5)*z_tmp)

Looks OK. Then you start looping over the elements of t_sec_ERT.

T_tmp <- matrix(nrow=length(z), ncol=length(t_sec_ERT))

for (i in 1:length(t_sec_ERT)) {
    T_tmp[, i] <- MANSRT + As * 
             exp(-z_tmp*(omega/(2*K))^0.5) * 
             sin(omega*t_sec_ERT[i] - ((omega/(2*K))^0.5)*z_tmp)
}

Or you can do it with sapply() which is often neater.

f <- function(x) {
    MANSRT + As * 
    exp(-z_tmp*(omega/(2*K))^0.5) * 
    sin(omega*x - ((omega/(2*K))^0.5)*z_tmp)
}

T_tmp <- sapply(t_sec_ERT, f)

Upvotes: 1

David Klotz

Reputation: 2431

I would prefer to put the data in a long format, with all combinations of z and t_sec_ERT as two columns, in order to take advantage of vectorization. Although I usually prefer tidyr for switching between long and wide formats, I've tried to keep this as a base solution:

t_sec_ERT <- seq(from = -23349600, to = 6706800, by = 10800)
z <- seq(from = 0.1, to = 15, by = 0.1)

v <- expand.grid(t_sec_ERT, z) 
names(v) <- c("t_sec_ERT", "z")
v$z_tmp <- v$z-0.1
v$T_tmp <- MANSRT+As*e^(-v$z_tmp*(omega/(2*K))^0.5)*sin(omega*v$t_sec_ERT-((omega/(2*K))^0.5)*v$z_tmp)

T_modelled <- data.frame(matrix(v$T_tmp, nrow = length(z), ncol = length(t_sec_ERT), byrow = TRUE))
names(T_modelled) <- paste0("t=", t_sec_ERT)
rownames(T_modelled) <- paste0("z=", seq(from = 0.1, to = 15, by = 0.1))

Upvotes: 0

Rui Barradas

Reputation: 76653

I believe this does it.
Run this first instruction right after creating T_modelled, it will be needed to test that the results are equal.

Tm <- T_modelled

Now run your code then run the code below.

z_tmp <- z - 0.1
for (i in 1:ncol(Tm)) {
    T_tmp <- MANSRT + As*exp(-z_tmp*(omega/(2*K))^0.5)*sin(omega*t_sec_ERT[i]-((omega/(2*K))^0.5)*z_tmp)
    Tm[ , i] <- T_tmp
}

all.equal(T_modelled, Tm)
#[1] TRUE

You don't need the inner loop, that's the only difference.
(I also used exp directly but that is of secondary importance.)

Upvotes: 2

Avoiding a loop when populating data frames in R

Answers (4)

Related Questions