Combining binary classification algorithms

Question

I have several algorithms which solve a binary classification (with response 0 or 1) problem by assigning to each observation a probability of the target value being equal to 1. All the algorithms try to minimize the log loss function where N is the number of observations, y_i is the actual target value and p_i is the probability of 1 predicted by the algorithm. Here is some R code with sample data:

actual.response = c(1,0,0,0,1)
prediction.df = data.frame(
  method1 = c(0.5080349,0.5155535,0.5338271,0.4434838,0.5002529),
  method2 = c(0.5229466,0.5298336,0.5360780,0.4217748,0.4998602),
  method3 = c(0.5175378,0.5157711,0.5133765,0.4372109,0.5215695),
  method4 = c(0.5155535,0.5094510,0.5201827,0.4351625,0.5069823)
)

log.loss = colSums(-1/length(actual.response)*(actual.response*log(prediction.df)+(1-actual.response)*log(1-prediction.df)))

The sample code gives the log loss for each algorithm:

method1   method3   method2   method4 
0.6887705 0.6659796 0.6824404 0.6719181

Now I want to combine this algorithms so I can minimize the log loss even further. Is there any R package which can do this for me? I will appreciate references to any algorithms, articles, books or research papers which solve this kind of problem. Note that as a final result I want to have the predicted probabilities of each class and note plain 0,1 responses.

Combining binary classification algorithms

Answers (1)

Related Questions