Alexandre PAMBIANCHI
Alexandre PAMBIANCHI

Reputation: 31

GLM with Cauchy family

I would like to train a GLM on R with family=Cauchy() but I don't find anything to do it. It looks like it has no sense or maybe Gaussian family does the job. Can someone explain me how to do it or why I can't?

Upvotes: 1

Views: 1215

Answers (3)

dipetkov
dipetkov

Reputation: 3690

An alternative approach is a Bayesian regression with t-distributed errors.

Recall that the Cauchy distribution is equivalent to the t distribution with nu = 1 degrees of freedom (ref. Wikipedia).

So rather than fix the degrees of freedom to be equal to 1, we can estimate them together with the other model parameters.

Here is how to do this with the brms package. We'll use the default prior for nu, which is Gamma(2, 0.1).

data(ereturns, package = "heavy")

library("brms")

fit.bayes <- brm(
  m.marietta ~ CRSP,
  family = student,
  data = ereturns,
  seed = 1234
)
prior_summary(fit.bayes)
#>                 prior     class coef group resp dpar nlpar lb ub       source
#>                (flat)         b                                       default
#>                (flat)         b CRSP                             (vectorized)
#>  student_t(3, 0, 2.5) Intercept                                       default
#>         gamma(2, 0.1)        nu                             1         default
#>  student_t(3, 0, 2.5)     sigma                             0         default

The result of the Bayesian analysis is that, even though the distribution family is heavy tailed, the posterior estimate of the degrees of freedom is between 3 and 4, not 1 (posterior mean = 4.4, posterior mode = 3.3).

summary(fit.bayes)
#>  Family: student 
#>   Links: mu = identity; sigma = identity; nu = identity 
#> Formula: m.marietta ~ CRSP 
#>    Data: ereturns (Number of observations: 60) 
#>   Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
#>          total post-warmup draws = 4000
#> 
#> Population-Level Effects: 
#>           Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
#> Intercept    -0.01      0.01    -0.02     0.01 1.00     3552     2674
#> CRSP          1.30      0.21     0.90     1.73 1.00     3267     2322
#> 
#> Family Specific Parameters: 
#>       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
#> sigma     0.06      0.01     0.04     0.08 1.00     2885     3281
#> nu        4.39      2.28     1.96     9.90 1.00     3435     3017
#> 
#> Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
#> and Tail_ESS are effective sample size measures, and Rhat is the potential
#> scale reduction factor on split chains (at convergence, Rhat = 1).

mode <- function(x) {
  z <- density(x) 
  z$x[which.max(z$y)] 
}

as_draws_df(fit.bayes) %>%
  summarise(
    across(nu, list(nu_posterior_mean = mean, posterior_mode = mode))
  )
#> # A tibble: 1 × 2
#>   nu_mean nu_posterior_mode
#>     <dbl>             <dbl>
#> 1    4.39              3.35

For comparison, here are the estimates with the heavy package as suggested by @StéphaneLaurent.

library("heavy")

fit.heavy <- heavyLm(m.marietta ~ CRSP, data = ereturns, family = Cauchy())
summary(fit.heavy)
#> Linear model under heavy-tailed distributions
#>  Data: ereturns; Family: Cauchy() 
#> 
#> Residuals:
#>       Min        1Q    Median        3Q       Max 
#> -0.135080 -0.032849  0.003565  0.044033  0.560639 
#> 
#> Coefficients:
#>              Estimate Std.Error Z value p-value
#> (Intercept) -0.0096   0.0072   -1.3378  0.1810
#> CRSP         1.1638   0.1670    6.9669  0.0000
#> 
#> Degrees of freedom: 60 total; 58 residual
#> Scale estimate: 0.001479587
#> Log-likelihood: 66.42185 on 3 degrees of freedom

Created on 2023-04-07 with reprex v2.0.2

Upvotes: 1

Ben Bolker
Ben Bolker

Reputation: 226097

There are two reasons you can't do this with a GLM (specifically, with glm()).

  • In the narrow sense, GLMs are only defined for conditional distributions in the exponential family (Poisson, binomial, Gaussian, Gamma, and a few others).
  • More generally, you can extend GLMs to quasi-likelihood models that are based on the mean-variance relationship of a particular distribution. But the Cauchy has undefined/infinite mean and variance, so that won't work.

While @StéphaneLaurent's suggestion to use the heavy package is probably best, you can also do this using a general maximum likelihood approach:

library(bbmle)
fit2 <- mle2(m.marietta ~ dcauchy(mu, exp(logsc)), 
             parameters = list(mu~CRSP), 
             start=list(mu=0, logsc=0), 
             data=ereturns)

coef(fit) and coef(fit2) show that the results are essentially the same.

Upvotes: 3

St&#233;phane Laurent
St&#233;phane Laurent

Reputation: 84529

You can use the heavy package (Robust Estimation Using Heavy-Tailed Distributions).

library(heavy)
data(ereturns)
fit <- heavyLm(m.marietta ~ CRSP, data = ereturns, family = Cauchy())
summary(fit)

Upvotes: 3

Related Questions