How to use Leave-one-Out method to predict Y with multiple columns using SKlearn?

Question

I have a sample dataframe that looks like below. Y columns all contain 0,1 binary outcomes. X is the columns that start from x_1 to x_13.

     x_1 x_2  ... x_13   y_1  y_2  y_3 ... y_48 
 1   0.1 0.2  .... 0.1     0    1    0 .... 0
 2   0.5 0.2 ....  0.2     1    0    1 .... 1
     ...
100  0.1 0.0 ....  0.5     0    1    0  ....0

I am new to the Machine Learning method. I plan to use Leave-one-out method to calculate F1 score. Without using Leave-one-out, we can use the code below:

accs = []

for i in range(48):
    Y = df['y_{}'.format(i+1)]
    model = RandomForest()
    model.fit(X, Y)
    predicts = model.predict(X)
    accs.append(f1(predicts,Y))
    
print(accs)

The result prints out [1,1,1....1]. How do I incorporate a leave-one-out method to ensure we just print out an average F1 score, such as 0.45?

StupidWolf · Accepted Answer

Example dataset:

import pandas as pd
import numpy as np
np.random.seed(111)

df = pd.concat([
pd.DataFrame(np.random.uniform(0,1,(100,10)),
columns = ["x_" + str(i) for i in np.arange(1,11)]),
pd.DataFrame(np.random.binomial(1,0.5,(100,5)),
columns = ["y_" + str(i) for i in np.arange(1,6)])
],axis=1)

X = df.filter(like="x_")

Then to fit, you can use cross_val_predict and KFold to get the prediction per fold. Set the number of splits to be as many as the number of your observations:

from sklearn.model_selection import cross_val_predict, KFold
from sklearn.ensemble import RandomForestClassifier 
from sklearn.metrics import f1_score

accs = []
result = []
loocv = KFold(len(X))

for i in range(5):
    Y = df['y_{}'.format(i+1)]
    model = RandomForestClassifier()
    fold_pred = cross_val_predict(model, X, Y, cv=loocv)
    result.append(f1_score(Y,predicts))

    model.fit(X, Y)
    predicts = model.predict(X)
    accs.append(f1_score(Y,predicts))
    

print(result)
[0.5, 0.5871559633027522, 0.5585585585585585, 0.5585585585585585, 0.5871559633027522]

How to use Leave-one-Out method to predict Y with multiple columns using SKlearn?

Answers (1)

Related Questions