Reputation: 31
I would like to train a GLM on R with family=Cauchy()
but I don't find anything to do it. It looks like it has no sense or maybe Gaussian family does the job. Can someone explain me how to do it or why I can't?
Upvotes: 1
Views: 1215
Reputation: 3690
An alternative approach is a Bayesian regression with t-distributed errors.
Recall that the Cauchy distribution is equivalent to the t distribution with nu = 1 degrees of freedom (ref. Wikipedia).
So rather than fix the degrees of freedom to be equal to 1, we can estimate them together with the other model parameters.
Here is how to do this with the brms package. We'll use the default prior for nu, which is Gamma(2, 0.1).
data(ereturns, package = "heavy")
library("brms")
fit.bayes <- brm(
m.marietta ~ CRSP,
family = student,
data = ereturns,
seed = 1234
)
prior_summary(fit.bayes)
#> prior class coef group resp dpar nlpar lb ub source
#> (flat) b default
#> (flat) b CRSP (vectorized)
#> student_t(3, 0, 2.5) Intercept default
#> gamma(2, 0.1) nu 1 default
#> student_t(3, 0, 2.5) sigma 0 default
The result of the Bayesian analysis is that, even though the distribution family is heavy tailed, the posterior estimate of the degrees of freedom is between 3 and 4, not 1 (posterior mean = 4.4, posterior mode = 3.3).
summary(fit.bayes)
#> Family: student
#> Links: mu = identity; sigma = identity; nu = identity
#> Formula: m.marietta ~ CRSP
#> Data: ereturns (Number of observations: 60)
#> Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
#> total post-warmup draws = 4000
#>
#> Population-Level Effects:
#> Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
#> Intercept -0.01 0.01 -0.02 0.01 1.00 3552 2674
#> CRSP 1.30 0.21 0.90 1.73 1.00 3267 2322
#>
#> Family Specific Parameters:
#> Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
#> sigma 0.06 0.01 0.04 0.08 1.00 2885 3281
#> nu 4.39 2.28 1.96 9.90 1.00 3435 3017
#>
#> Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
#> and Tail_ESS are effective sample size measures, and Rhat is the potential
#> scale reduction factor on split chains (at convergence, Rhat = 1).
mode <- function(x) {
z <- density(x)
z$x[which.max(z$y)]
}
as_draws_df(fit.bayes) %>%
summarise(
across(nu, list(nu_posterior_mean = mean, posterior_mode = mode))
)
#> # A tibble: 1 × 2
#> nu_mean nu_posterior_mode
#> <dbl> <dbl>
#> 1 4.39 3.35
For comparison, here are the estimates with the heavy package as suggested by @StéphaneLaurent.
library("heavy")
fit.heavy <- heavyLm(m.marietta ~ CRSP, data = ereturns, family = Cauchy())
summary(fit.heavy)
#> Linear model under heavy-tailed distributions
#> Data: ereturns; Family: Cauchy()
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -0.135080 -0.032849 0.003565 0.044033 0.560639
#>
#> Coefficients:
#> Estimate Std.Error Z value p-value
#> (Intercept) -0.0096 0.0072 -1.3378 0.1810
#> CRSP 1.1638 0.1670 6.9669 0.0000
#>
#> Degrees of freedom: 60 total; 58 residual
#> Scale estimate: 0.001479587
#> Log-likelihood: 66.42185 on 3 degrees of freedom
Created on 2023-04-07 with reprex v2.0.2
Upvotes: 1
Reputation: 226097
There are two reasons you can't do this with a GLM (specifically, with glm()
).
While @StéphaneLaurent's suggestion to use the heavy
package is probably best, you can also do this using a general maximum likelihood approach:
library(bbmle)
fit2 <- mle2(m.marietta ~ dcauchy(mu, exp(logsc)),
parameters = list(mu~CRSP),
start=list(mu=0, logsc=0),
data=ereturns)
coef(fit)
and coef(fit2)
show that the results are essentially the same.
Upvotes: 3
Reputation: 84529
You can use the heavy package (Robust Estimation Using Heavy-Tailed Distributions).
library(heavy)
data(ereturns)
fit <- heavyLm(m.marietta ~ CRSP, data = ereturns, family = Cauchy())
summary(fit)
Upvotes: 3