jjaskulowski
jjaskulowski

Reputation: 2574

How does ada::predict.ada work?

I tried on Cross-validated but without a response and this is a technical, implementation-centric question.

I used ada::ada in R to create a boosted model which is based on decision trees.

It normally returns a matrix with stats on predicted results compared to expected outcome.

It's something like that:

       FALSE  TRUE
FALSE  11023  1023  
TRUE     997  5673

That's cool, good accuracy.

Now it's time to predict on new data. So I went with:

predict(myadamodel, newdata=giveinputs())

But instead of a simple answer TRUE/FALSE I've got:

[1] FALSE TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE 
[25] TRUE  TRUE  FALSE TRUE  TRUE  TRUE  TRUE  TRUE  FALSE TRUE  TRUE  TRUE  TRUE  FALSE FALSE FALSE FALSE TRUE  FALSE TRUE  FALSE TRUE  FALSE TRUE 
[49] FALSE FALSE
Levels: FALSE TRUE

I presume that this ada object is an ensamble and I received an answer from each classifier.

But in the end I need a final straight answer: TRUE/FALSE. If that's all I can get I need to know how does the "ada" function computes the final answer that was used to build the statistic. I would check that but the "ada" function is precompiled.

How do I get the final TRUE/FALS answer to comply with the statistic that ada return from the learning phase?

I've attached an example that you can copy-paste:

 mydata = data.frame(a=numeric(0),b=double(0),r=logical(0))

 for(i in -10:10)
 for(j in 20:-4)
    mydata[length(mydata[,1])+1,] = c(a=i,b=j, r= (j > i))

myada = ada(mydata[,c("a","b")], mydata[,"r"])
print(myada);
predict(myada, data.frame(a=4,b=7))

Please note that the r-column is for some reason expressed as "0" "1". I don't know why and how to tell data.frame not to convert TRUE FALSE to 0, 1 but the idea stays the same.

Upvotes: 0

Views: 1130

Answers (1)

MrFlick
MrFlick

Reputation: 206546

OK. The reproducible example helped. It looks to be a quirk in the way predict works when you pass new data that has just one row. In this case, you're getting an estimate from each of the iterations (the default number of iterations is 50). Note that you only get two values returned when you do

predict(myada, data.frame(a=4:3,b=7:8))

This is basically because of a use of sapply within the predict function. We can make our own which doesn't have this problem.

predict.ada <- ada:::predict.ada
body(predict.ada)[[12]] <- quote( tmp <- t(do.call(rbind, 
    lapply(1:iter, function(i) f(f = object$model$trees[[i]], 
    dat = newdata)))))

and then we can run

predict.ada(myada, newdata=data.frame(a=4,b=7))
# [1] TRUE
# Levels: FALSE TRUE

so this new values is predicted to be TRUE. This was tested in ada_2.0-3 and may break in other versions.

Also, in your test data, when you use c() to merge elements they must be all the same data type or they will be converted to the lowest common denominator data type that can hold all values. If you're mixing types, it's better to use a list(). For example

mydata[length(mydata[,1])+1,] = list(a=i,b=j, r= (j > i))

Upvotes: 2

Related Questions