HHH
HHH

Reputation: 6475

Invalid probabilities in xgboost in Spark

I'm using xgboost in spark (scala api). I'm training my model using the following parameters:

val params = List("eta" -> "0.1", "max_depth" -> "2", 
                  "silent" -> "1", "objective" -> "binary:logistic").toMap 
XGBoost.train(trainRDD, params, 10, 10)

Then it provides two predict functions for scoring. One which take a DMatrix and operates in the driver (locally) and one which takes an RDD[Vector] and operates in a distributed mode.

For the same test data set, these two functions return different values: the one which operates locally returns values which are like -1.23 or 1.34, but the other one returns 0.21 or 0.71. It looks like the second one returns the probabilities which are between 0 and 1, but the first one returns something else.

Could someone elaborate on this?

Upvotes: 2

Views: 284

Answers (1)

HHH
HHH

Reputation: 6475

I found the issue. The predict function which operates locally output margin values. That means, we need to apply the logistic transformation on those values in order to get the probabilities.

Upvotes: 1

Related Questions