Reputation: 1363
Rather than adding in more comments or making my original question longer, I have created another question. I have received excellent advice in the previous question (here) but I am not good enough in R to implement the suggestions in the comments.
The original code, that took ages, was:
Male.MC <-c()
for (j in 1:100) {
for (i in 1:nrow(Male.Distrib)) {
u2 <- Male.Distrib$stddev_u2[i] * rnorm(1, mean = 0, sd = 1)
mc_bca <- Male.Distrib$FixedEff[i] + u2
temp <- Lambda.Value*mc_bca+1
ginv_a <- temp^(1/Lambda.Value)
d2ginv_a <- max(0,(1-Lambda.Value)*temp^(1/Lambda.Value-2))
mc_amount <- ginv_a + d2ginv_a * Male.Resid.Var / 2
z <- data.frame(
RespondentID = Male.Distrib$RespondentID[i],
Subgroup = Male.Distrib$Subgroup[i],
mc_amount = mc_amount,
IndvWeight = Male.Distrib$INDWTS[i]/100
)
Male.MC <- as.data.frame(rbind(Male.MC,z))
}
}
The replicate()
answer worked well when I thought I only needed one output (mc_amount
) from the function:
Male.Distrib = read.table('MaleDistrib.txt', check.names=F)
getMC <- function(df, Lambda.Value=0.4, Male.Resid.Var=12.1029420429778) {
u2 <- df$stddev_u2 * rnorm(nrow(df), mean = 0, sd = 1)
mc_bca <- df$FixedEff + u2
temp <- Lambda.Value*mc_bca+1
ginv_a <- temp^(1/Lambda.Value)
d2ginv_a <- max(0,(1-Lambda.Value)*temp^(1/Lambda.Value-2))
mc_amount <- ginv_a + d2ginv_a * Male.Resid.Var / 2
mc_amount
}
replicate(10, getMC(Male.Distrib))
However, even with data corrections made, I am getting unexpected results so I need to be able to see the values for all the interim calculations to determine where I have gone wrong in my logic. This is where I'm stuck. I have created a smaller data frame called tempdata
for testing, which is just head()
from my larger dataset of 7135 observations. The tempdata
set is:
RndmEff RespondentID Subgroup RespondentID Replicates IntakeAmt RACE INDWTS TOTWTS GRPWTS NUMSUBJECTS TOTSUBJECTS FixedEff stddev_u2
1 1.343753 9966 6 9966 41067 33.449808 2 41067 120622201 41657878 1466 7135 6.089918 2.645938
2 -5.856516 9967 5 9967 2322 2.533528 3 2322 120622201 22715139 1100 7135 6.755664 2.645938
3 -3.648339 9970 4 9970 17434 9.575439 2 17434 120622201 10520535 1424 7135 7.079757 2.645938
4 2.697533 9972 6 9972 21723 43.340180 2 21723 120622201 41657878 1466 7135 6.089918 2.645938
5 3.531878 9974 3 9974 375 55.660607 3 375 120622201 10791729 1061 7135 6.176319 2.645938
6 6.627767 9976 6 9976 48889 91.480049 2 48889 120622201 41657878 1466 7135 6.089918 2.645938
The updated commands that I am using are:
getMC <- function(df, Lambda.Value=0.4, Male.Resid.Var=12.1029420429778) {
RespondentID <- df$RespondentID
u2 <- df$stddev_u2 * rnorm(nrow(df), mean = 0, sd = 1)
mc_bca <- df$FixedEff + u2
temp <- max(Lambda.Value*mc_bca+1,Lambda.Value*Min_bca+1)
ginv_a <- temp^(1/Lambda.Value)
d2ginv_a <- max(0,(1-Lambda.Value)*temp^(1/Lambda.Value-2))
mc_amount <- ginv_a + d2ginv_a * Male.Resid.Var / 2
return(list(RespondentID, temp, ginv_a, d2ginv_a, mc_amount))
}
Test <- replicate(10, getMC(tempdata))
I get a very nice layout for my calculated variables (temp
, ginv_a
, d2ginv_a
, mc_amount
) but there are two problems with the results. These problems could be related, I don't understand enough to work out what is happening.
First, I only get 10 columns relating to the first RespondentID, so the function does not seem to be applied to the 6 that are in the dataset.
Second, I get 10 columns, but the RespondentID
results are concatenated into one cell in each column. If I add u2
or mc_bca
to the return list, these are also similarly concatenated into one cell. I have read the R
help for return
and it contains this line
value could be a series of non-empty expressions separated by commas. In that case the value returned is a list of the evaluated expressions, with names set to the expressions where these are the names of R objects. but I don't understand enough about R function programming to know if that is relevant.
I'm hoping there is a quick and obvious fix to this. I have been unable to find a similar problem of which I could copy the solution, all the examples of multiple returns from functions that I have found have used variables being calculated in the function.
I have tried the alternative of creating an empty data frame
and then trying to vectorize the results into that. I'm worse at vectorizing than I am at replication.
Update: missed the min_bca
value, which is -2.44478269434376
Upvotes: 0
Views: 515
Reputation: 32986
After a couple of more edits, hopefully here is the final solution to your question.
getMC <- function(df, Lambda.Value=0.4, Male.Resid.Var=12.1029420429778,Min_bca=-2.44478269434376) {
u2 <- df$stddev_u2 * rnorm(nrow(df), mean = 0, sd = 1)
mc_bca <- df$FixedEff + u2
temp <- pmax((Lambda.Value*mc_bca+1),(Lambda.Value*Min_bca+1))
ginv_a <- temp^(1/Lambda.Value)
d2ginv_a <- max(0,(1-Lambda.Value)*temp^(1/Lambda.Value-2))
mc_amount <- ginv_a + d2ginv_a * Male.Resid.Var / 2
return(data.frame(RespondentID=df$RespondentID,temp=temp, ginv_a, d2ginv_a, mc_amount))
}
data=rep(list(tempdata),10) # change 10 to a higher number of replicates
result_data=llply(data,getMC, .progress = "text")
Some notes: I had to troubleshoot your function on a single replicate, line by line, to find out what was wrong (this is something you should do before posting because the question above is not about this issue). max(vector1,vector2)
returns a single value which makes temp
the same for all RespondentID
. Instead I replaced it with pmax
(see ?max
for an explanation).
Upvotes: 1