Reputation: 137
I am using a modified quantile regression function found in a scientific paper. (source: https://arxiv.org/pdf/2111.04805.pdf)
I try to apply this particular methodology to my dataset, but I would like to perform this quantile regression based on a condition from my dataset. This means that I want to implement something similar to a group_by condition as found here:
qr_models = data %>%
group_by(latitude) %>%
do(model = rq(julian_day~year, tau = 1:99/100, method = "fn", data= .))
The way it is coded in the paper is the following and the loop is where I would like to add the group_by:
for (i in 1:n) {
alpha = i/(n+1)
alphas[i] = alpha
betas <- smrq(X,y,tau=alpha)
vals0[i] <- sum(y<(X %*% betas))
}
where n is n <- 99
(the quantile resolution chosen); vals0 <- rep(0,n)
and alphas <- rep(0,n)
. I tend to avoid using loops in R so I am a bit lost on how to do it.
Just in case it is needed to understand, the smrq()
function is the one that is described in the paper mentioned previously, and is coded like this:
smrq <- function(X, y, tau){
p = ncol(X)
op.result <- optim(rep(0, p),
fn = minimize.logcosh,
method = 'BFGS',
X = X,
y = y,
tau = tau)
beta <- op.result$par
return (beta)
}
where X is the explanatory variable matrix and y the response variable.
I hope it is clear enough, if more details are needed I will be happy to update my post. Many thanks for the precious help.
Upvotes: 0
Views: 73
Reputation: 107757
Consider encapsulating the author's entire processing in a user-defined method that receives data (swiss
from authors) as input parameter along with other variables including formula (Fertility ~ .
from authors) and response column ("Fertility"
from authors).
Then, pass data subsets with group_by
. Also, the authors for
loop can be refactored to a vectorized loop such as sapply
or vapply
since the return is a numeric vector.
Generalized Functions
minimize.logcosh <- function(par, X, y, tau) {
diff <- y-(X %*% par)
check <- (tau-0.5)*diff+(0.5/0.7)*logcosh(0.7*diff)+0.4
return(sum(check))
}
smrq <- function(X, y, tau){
p <- ncol(X)
op.result <- optim(
rep(0, p),
fn = minimize.logcosh,
method = 'BFGS',
X = X,
y = y,
tau = tau
)
beta <- op.result$par
return(beta)
}
run_smrq <- function(data, fml, response) {
x <- model.matrix(fml, data)[,-1]
y <- data[[response]]
X <- cbind(x, rep(1,nrow(x)))
n <- 99
betas <- sapply(1:n, function(i) smrq(X, y, tau=i/(n+1)))
# betas <- vapply(1:n, function(i) smrq(X, y, tau=i/(n+1)), numeric(1))
return(betas)
}
Callers
Test authors' example
swiss <- datasets::swiss
smrq_models <- run_smrq(data=swiss, fml=Fertility~., response="Fertility")
dplyr
(using group_map
)
smrq_models <- data %>%
group_by(latitude) %>%
group_map(~ run_smrq(data=., fml=julian_day~year, response="julian_day")
base
(using by
, the object-oriented wrapper to tapply
)
smrq_models <- by(
data,
data$latitude,
function(sub) run_smrq(data=sub, fml=julian_day~year, response="julian_day")
)
Upvotes: 2