Reputation: 4426
I am building a DataFrame row by row and then running a regression on it. For simplicity, the code is:
using DataFrames
using GLM
df = DataFrame(response = Number[])
for i in 1:10
df = vcat(df, DataFrame(response = rand()))
end
fit(LinearModel, @formula(response ~ 1), df)
I get the error:
ERROR: LoadError: MethodError: Cannot `convert` an object of type Array{Number,1} to an object of type GLM.LmResp
This may have arisen from a call to the constructor GLM.LmResp(...),
since type constructors fall back to convert methods.
Stacktrace:
[1] fit(::Type{GLM.LinearModel}, ::Array{Float64,2}, ::Array{Number,1}) at ~/.julia/v0.6/GLM/src/lm.jl:140
[2] #fit#44(::Dict{Any,Any}, ::Array{Any,1}, ::Function, ::Type{GLM.LinearModel}, ::StatsModels.Formula, ::DataFrames.DataFrame) at ~/.julia/v0.6/StatsModels/src/statsmodel.jl:72
[3] fit(::Type{GLM.LinearModel}, ::StatsModels.Formula, ::DataFrames.DataFrame) at ~/.julia/v0.6/StatsModels/src/statsmodel.jl:66
[4] include_from_node1(::String) at ./loading.jl:576
[5] include(::String) at ./sysimg.jl:14
while loading ~/test.jl, in expression starting on line 10
The call to the linear regression is very similar to regression in "Introducing Julia":
linearmodel = fit(LinearModel, @formula(Y1 ~ X1), anscombe)
What is the problem?
Upvotes: 3
Views: 2665
Reputation: 4426
After a few hours, I realized that GLM requires concrete types and Number is an abstract type (even though the documentation for GLM.LmResp says little about this at the time of this writing, only "Encapsulates the response for a linear model"). The solution is to change the declaration to a concrete type, such as Float64:
using DataFrames
using GLM
df = DataFrame(response = Float64[])
for i in 1:10
df = vcat(df, DataFrame(response = rand()))
end
fit(LinearModel, @formula(response ~ 1), df)
Output:
StatsModels.DataFrameRegressionModel{GLM.LinearModel{GLM.LmResp{Array{Float64,1}},GLM.DensePredChol{Float64,Base.LinAlg.Cholesky{Float64,Array{Float64,2}}}},Array{Float64,2}}
Formula: response ~ +1
Coefficients:
Estimate Std.Error t value Pr(>|t|)
(Intercept) 0.408856 0.0969961 4.21518 0.0023
The type has to be concrete, e.g. the abstract type Real
with df = DataFrame(response = Real[])
fails with a more helpful error message:
ERROR: LoadError: `float` not defined on abstractly-typed arrays; please convert to a more specific type
Alternatively, you can convert to Real
after building the dataframe:
using DataFrames
using GLM
df = DataFrame(response = Number[])
for i in 1:10
df = vcat(df, DataFrame(response = rand()))
end
df2 = DataFrame(response = map(Real, df[:response]))
fit(LinearModel, @formula(response ~ 1), df2)
This works because converting to Real actually converts to Float64:
julia> typeof(df2[:response])
Array{Float64,1}
I filed an issue with GLM to improve the error message.
Upvotes: 1