Reputation: 531
I have a sample dataframe that looks like below. Y columns all contain 0,1 binary outcomes. X is the columns that start from x_1 to x_13.
x_1 x_2 ... x_13 y_1 y_2 y_3 ... y_48
1 0.1 0.2 .... 0.1 0 1 0 .... 0
2 0.5 0.2 .... 0.2 1 0 1 .... 1
...
100 0.1 0.0 .... 0.5 0 1 0 ....0
I am new to the Machine Learning method. I plan to use Leave-one-out method to calculate F1 score. Without using Leave-one-out, we can use the code below:
accs = []
for i in range(48):
Y = df['y_{}'.format(i+1)]
model = RandomForest()
model.fit(X, Y)
predicts = model.predict(X)
accs.append(f1(predicts,Y))
print(accs)
The result prints out [1,1,1....1]. How do I incorporate a leave-one-out method to ensure we just print out an average F1 score, such as 0.45?
Upvotes: 1
Views: 625
Reputation: 46968
Example dataset:
import pandas as pd
import numpy as np
np.random.seed(111)
df = pd.concat([
pd.DataFrame(np.random.uniform(0,1,(100,10)),
columns = ["x_" + str(i) for i in np.arange(1,11)]),
pd.DataFrame(np.random.binomial(1,0.5,(100,5)),
columns = ["y_" + str(i) for i in np.arange(1,6)])
],axis=1)
X = df.filter(like="x_")
Then to fit, you can use cross_val_predict
and KFold
to get the prediction per fold. Set the number of splits to be as many as the number of your observations:
from sklearn.model_selection import cross_val_predict, KFold
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import f1_score
accs = []
result = []
loocv = KFold(len(X))
for i in range(5):
Y = df['y_{}'.format(i+1)]
model = RandomForestClassifier()
fold_pred = cross_val_predict(model, X, Y, cv=loocv)
result.append(f1_score(Y,predicts))
model.fit(X, Y)
predicts = model.predict(X)
accs.append(f1_score(Y,predicts))
print(result)
[0.5, 0.5871559633027522, 0.5585585585585585, 0.5585585585585585, 0.5871559633027522]
Upvotes: 3