JeffHeaton
JeffHeaton

Reputation: 3278

How to round the results of a GLM predict in Julia

I am trying to do a very simple Logistic regression in Julia. But Julia's typing system seems to be causing me problems. Basically, glm predict gives me an array of probabilities. I want to do a simple round so that if the probability >= 0.5, it is a 1, otherwise a 0. I would like those labels to also be integers.

No matter what I do, I can't convert the DataArray returned by predict to Int64. If I create an adhoc DataArray, I can round it just fine. Even though they both show a type of DataArrays.DataArray{Float64,1}. I've also tried things like pred>0.5, but that fails similarly. Clearly there is some magic with the return value from predict, beyond the type, that makes it different than the other DataArray in my short program.

using DataFrames;
using GLM;

df = readtable("./data/titanic-dataset.csv");

delete!(df, :PassengerId);
delete!(df, :Name);
delete!(df, :Ticket);
delete!(df, :Cabin);
pool!(df, [:Sex]);
pool!(df, [:Embarked]);
df[isna.(df[:Age]),:Age] = median(df[ .~isna.(df[:Age]),:Age])

model = glm(@formula(Survived ~ Pclass + Sex + Age + SibSp + Parch + Fare + Embarked), df, Binomial(), LogitLink());
pred = predict(model,df);

z = DataArray([1.0,2.0,3.0]);
println(typeof(z));
println(typeof(pred));
println(round.(Int64,z));  # Why does this work?
println(round.(Int64,pred)); # But this does not?

The output is:

DataArrays.DataArray{Float64,1}
DataArrays.DataArray{Float64,1}
[1, 2, 3]
MethodError: no method matching round(::Type{Int64}, ::DataArrays.NAtype)
Closest candidates are:
  round(::Type{T<:Integer}, ::Integer) where T<:Integer at int.jl:408
  round(::Type{T<:Integer}, ::Float16) where T<:Integer at float.jl:338
  round(::Type{T<:Union{Signed, Unsigned}}, ::BigFloat) where T<:Union{Signed, Unsigned} at mpfr.jl:214
  ...

Stacktrace:
 [1] macro expansion at C:\Users\JHeaton\.julia\v0.6\DataArrays\src\broadcast.jl:32 [inlined]
 [2] macro expansion at .\cartesian.jl:64 [inlined]
 [3] macro expansion at C:\Users\JHeaton\.julia\v0.6\DataArrays\src\broadcast.jl:111 [inlined]
 [4] _broadcast!(::DataArrays.##116#117{Int64,Base.#round}, ::DataArrays.DataArray{Int64,1}, ::DataArrays.DataArray{Float64,1}) at C:\Users\JHeaton\.julia\v0.6\DataArrays\src\broadcast.jl:67
 [5] broadcast!(::Function, ::DataArrays.DataArray{Int64,1}, ::Type{Int64}, ::DataArrays.DataArray{Float64,1}) at C:\Users\JHeaton\.julia\v0.6\DataArrays\src\broadcast.jl:169
 [6] broadcast(::Function, ::Type{T} where T, ::DataArrays.DataArray{Float64,1}) at .\broadcast.jl:434
 [7] include_string(::String, ::String) at .\loading.jl:515

Upvotes: 0

Views: 514

Answers (1)

Michael K. Borregaard
Michael K. Borregaard

Reputation: 8044

You can't create integers when you have NAs in z. You can round. them (in which case you'll get a DataArray of Floats), but when you try to make them Int it will complain because NA can't be Int64. Instead do

convert(DataArray{Int}, round.(z))

Also, it is nicer to post an example using data available in a package rather than a local dataset on your computer.

Upvotes: 2

Related Questions