How to create a range out of a continuous variable in Julia?

Question

I am running the average marginal effects in Julia using the Effects package. My aim is to see how weight change between men and women at different ages. As you can see in the output below, it runs the average marginal effects for each age for man and women. However, I would like to take a range of the age variable and not take each year on itself alone. For example I would like to have the age range of 0:5, 5:10, 10:15 and so on. This has to be done after the regression model is run and not beforehand. I tried to work it on my own, but I am not fluent enough in Julia.

So the only line that needs to be rectified is the following:

d1 = Dict(:sex => ["male","female"],:age => [0:5; 6:20])

Here is the code:

using DataFrames, Effects, GLM, StatsModels, StableRNGs
rng = StableRNG(42)
growthdata = DataFrame(; age=[13:20; 13:20],
                           sex=repeat(["male", "female"], inner=8),
                           weight=[range(100, 155; length=8); range(100, 125; length=8)] .+ randn(rng, 16))

mod_uncentered = lm(@formula(weight ~ 1 + sex * age), growthdata)
d1 = Dict(:sex => ["male","female"],:age => [0:5; 6:20])
ave = effects(d1, mod_uncentered)

OUTPUT

    sex   age   weight   err    lower   upper
String  Int64   Float64 Float64 Float64 Float64
1   male    0   0.287822    2.88762 -2.5998 3.17545
2   female  0   56.4387 2.88762 53.5511 59.3263
3   male    1   8.00869 2.71603 5.29266 10.7247
4   female  1   59.8481 2.71603 57.1321 62.5641
5   male    2   15.7296 2.54468 13.1849 18.2742
6   female  2   63.2575 2.54468 60.7128 65.8022
7   male    3   23.4504 2.37361 21.0768 25.824
8   female  3   66.6669 2.37361 64.2933 69.0405
9   male    4   31.1713 2.2029  28.9684 33.3742
10  female  4   70.0763 2.2029  67.8734 72.2792
11  male    5   38.8922 2.03264 36.8595 40.9248
12  female  5   73.4857 2.03264 71.4531 75.5184
13  male    6   46.613  1.86295 44.7501 48.476
14  female  6   76.8951 1.86295 75.0322 78.7581
15  male    7   54.3339 1.69399 52.6399 56.0279
16  female  7   80.3046 1.69399 78.6106 81.9985
17  male    8   62.0548 1.52602 60.5288 63.5808
18  female  8   83.714  1.52602 82.1879 85.24
19  male    9   69.7756 1.3594  68.4162 71.135
20  female  9   87.1234 1.3594  85.764  88.4828
21  male    10  77.4965 1.19469 76.3018 78.6912
22  female  10  90.5328 1.19469 89.3381 91.7275
23  male    11  85.2174 1.03282 84.1846 86.2502
24  female  11  93.9422 1.03282 92.9094 94.975
25  male    12  92.9383 0.875345    92.0629 93.8136
26  female  12  97.3516 0.875345    96.4762 98.2269
27  male    13  100.659 0.72515 99.934  101.384
28  female  13  100.761 0.72515 100.036 101.486
29  male    14  108.38  0.587838    107.792 108.968
30  female  14  104.17  0.587838    103.583 1

Livius · Accepted Answer

For those familiar with R, Effects.jl is comparable to the effects package and not the emmeans package. While there is a certain amount of overlap in effects and emmeans, effects "only" makes predictions for particular values of the predictors, while emmeans is capable of computing marginal averages over several values (e.g., ranges) of the predictors.

Effects.jl is essentially a wrapper for doing a few things:

computing a fully crossed "reference grid" of a small set of predictors
finding typical values for all other predictors in that model. (generally the mean, but you can use a different summary function, note that you need to think about what your summary function means for contrasts associated with a categorical predictor has an interpretation)
adding these typical values into the reference grid for a fully specified set of data to make predictions on
computing predictions and associated standard errors based on the variance-covariance matrix of the model parameter estimates (vcov). Note that for mixed models, this means only the fixed effects play a role. (The same holds for the use of the effects package in R with models fit with lme4.)

In other words, Effects.jl doesn't understand ranges, it just understands a set of values. It doesn't know how to make a prediction for 0:5, but it does know how to make predictions for 0, 1, etc.

Since you're interesting in the average prediction over a range, you could just compute the average of the predictions you have:

julia> using Statistics

julia> transform!(ave, :age => ByRow(x -> x <= 5 ? "0:5" : "6:20") => :age_bin)
42×7 DataFrame
 Row │ sex     age    weight      err       lower      upper      age_bin 
     │ String  Int64  Float64     Float64   Float64    Float64    String  
─────┼────────────────────────────────────────────────────────────────────
   1 │ male        0    0.287822  2.88762    -2.5998     3.17545  0:5
   2 │ female      0   56.4387    2.88762    53.5511    59.3263   0:5
   3 │ male        1    8.00869   2.71603     5.29266   10.7247   0:5
   4 │ female      1   59.8481    2.71603    57.1321    62.5641   0:5
   5 │ male        2   15.7296    2.54468    13.1849    18.2742   0:5
   6 │ female      2   63.2575    2.54468    60.7128    65.8022   0:5
   7 │ male        3   23.4504    2.37361    21.0768    25.824    0:5
   8 │ female      3   66.6669    2.37361    64.2933    69.0405   0:5
   9 │ male        4   31.1713    2.2029     28.9684    33.3742   0:5
  10 │ female      4   70.0763    2.2029     67.8734    72.2792   0:5
  11 │ male        5   38.8922    2.03264    36.8595    40.9248   0:5
  12 │ female      5   73.4857    2.03264    71.4531    75.5184   0:5
  13 │ male        6   46.613     1.86295    44.7501    48.476    6:20
  14 │ female      6   76.8951    1.86295    75.0322    78.7581   6:20
....
julia> rms(x) = sqrt(mean(abs2, x))
rms (generic function with 1 method)

julia> combine(groupby(ave, [:sex, :age_bin]), :weight => mean, :err => rms; renamecols=false)

4×4 DataFrame
 Row │ sex     age_bin  weight    err     
     │ String  String   Float64   Float64 
─────┼────────────────────────────────────
   1 │ male    0:5       19.59    2.47686
   2 │ female  0:5       64.9622  2.47686
   3 │ male    6:20     100.659   1.04247
   4 │ female  6:20     100.761   1.04247

For the error, I used root-mean-square (RMS): in other words, take the mean of the associated variances and then convert back to standard deviation scale. (Standard errors are the standard deviation of the sampling distribution of a test statistic.)

For this particular model (well-balanced data, no pesky covariates, no nonlinear transformations of the response), this works out to be the same prediction you would get from taking the mean of the predictors and then computing a single prediction:

julia> d2 = Dict(:sex => ["male","female"],:age => [ mean(0:5); mean(6:20)])
Dict{Symbol, Vector} with 2 entries:
  :sex => ["male", "female"]
  :age => [2.5, 13.0]

julia> effects(d2, mod_uncentered)
4×6 DataFrame
 Row │ sex     age      weight    err      lower     upper    
     │ String  Float64  Float64   Float64  Float64   Float64  
─────┼────────────────────────────────────────────────────────
   1 │ male        2.5   19.59    2.4591    17.1309   22.0491
   2 │ female      2.5   64.9622  2.4591    62.5031   67.4213
   3 │ male       13.0  100.659   0.72515   99.934   101.384
   4 │ female     13.0  100.761   0.72515  100.036   101.486

The errors are somewhat smaller, because the error here reflects the uncertainty associated with a single prediction, while the error above reflects the uncertainty from several predictions.

How to create a range out of a continuous variable in Julia?

Answers (2)

Related Questions