Reputation: 51
The code I show you here calculates exactly what I want, except for one problem: for large datasets, it takes just TOO LONG. Therefore, I was wondering if there is an alternative solution using the apply() family or some other way.
I always have a hard time to reexpress nested loops into vectorial functions. Do you think you could give me a hand? I would highly appreciate it ;)
So, before applying this nested loops I already have:
Here it goes ... sorry if it´s hard to understand ...
#loop for each WELL from "DATA_100_WELLS"
for (WELL_PROCESS in unique(DATA_100_WELLS$WELL)) {
#----------------------------------------------------------------------------------
#I take just 1 of the wells
print("WELL------------------------------------------------------------")
print(WELL_PROCESS)
DATA_WELL <- DATA_100_WELLS[DATA_100_WELLS$WELL==WELL_PROCESS,] #select just the well I want
#I calculate some stuff (Var_est0, sigma, linf, lsup, Za, Zb, n_ray and A)
DATA_WELL$Var_est0 = predict(chosen_model,data.frame(predict=DATA_WELL$predict))
DATA_WELL$sigma = sqrt(DATA_WELL$Var_est0)
DATA_WELL$linf <- DATA_WELL$predict+DATA_WELL$sigma*ta
DATA_WELL$lsup <- DATA_WELL$predict+DATA_WELL$sigma*tb
Za <- qnorm(alfa/2)
Zb <- qnorm(1-alfa/2)
n_ray <- mean(DATA_WELL$predict)
A = sum(DATA_WELL$Var_est0)
#Then i create an empty df called "TABLE", and slice off the heading
TABLE<-data.frame(well="",d=0,p=0)
TABLE<-TABLE[-1,]
#After that, I iterate over each WELL from the second df, "DATA"
for (well in unique(DATA$WELL)){
print(paste("Process...: ",well,sep=""))
#I calculate variable "large",based on max value of the existing variable "DELTA" (numeric)
large = max(DATA[DATA$WELL==well,]$DELTA)
#cicle from 1 max.distance (large-1)
for (d in c(1:(large-1))){
#cicle from position 1 to large-distance (look how this turns to be symmetric)
for (pos in (1:(large-d))){
#I did all of this to calculate variables ti and tj
ti = DATA[DATA$WELL==well & DATA$DELTA==pos,]$ti
tj = DATA[DATA$WELL==well & DATA$DELTA==pos+d,]$ti
#I append the results into the once empty df "TABLE", and calculate p based on ti*tj
TABLE<-rbind(TABLE,data.frame(well=well,d=d,p=ti*tj))
}
}
}
References:
That´s it. The solution could also be written in Python, if necessary.
Seriously, thank you!!
Upvotes: 0
Views: 165
Reputation: 326
As mentioned in the comments it is difficult to help without a reproducible example though I'll give it a try. The following changes should speed things up:
1) do not rbind an object to itself repeatedly rather insert elements into a list and call bind_rows after the loops.
2) subset the DATA df into the wells_df for every well in the outer loop.
Even better than #2, but not implemented, you could split the DATA into a list by WELL ahead of the loops so you are only traversing DATA one time.
I did not run this code.
table_agg <- list()
for (well in unique(DATA$WELL)){
print(paste("Process...: ",well,sep=""))
#I calculate variable "large",based on max value of the existing variable "DELTA" (numeric)
wells_df <- DATA[DATA$WELL==well,]
large = max(wells_df$DELTA)
#cicle from 1 max.distance (large-1)
for (d in c(1:(large-1))){
#cicle from position 1 to large-distance (look how this turns to be symmetric)
for (pos in (1:(large-d))){
#I did all of this to calculate variables ti and tj
ti = wells_df[wells_df$DELTA==pos,]$ti
tj = wells_df[wells_df$DELTA==pos+d,]$ti
#I append the results into the once empty df "TABLE", and calculate p based on ti*tj
table_agg[[length(table_agg)+1]]<-data.frame(well=well,d=d,p=ti*tj)
}
}
}
TABLE <- dplyr::bind_rows(table_agg)
Upvotes: 2