Reputation: 13
I am currently working on a logistic regression model to predict the outcome of certain trades. This model classifies trades in the test set as good/bad (1/0). I want to see which trades are being classified in each group and multiply the trades classified as "good" by its profit/loss to find out if the logistic regression model is actually profitable. Is there any way I am able to view row-specific info of the entries that the model classifies as True/False?
This is what my code looks like for my data scaling and splitting into train/test set:
x = df[x_train_features]
y = df["y"]
y = y.astype("int")
# scale data
scaler = MinMaxScaler()
scaledx = scaler.fit_transform(x)
# split training data into test and training sets
X_train, X_test, y_train, y_test = train_test_split(scaledx, y, test_size=0.25)
# instantiate the model (using the default parameters)
logreg = LogisticRegression()
# fit the model with data
logreg.fit(X_train, y_train)
y_pred_test = (logreg.predict_proba(X_test)[:, 1] >= 0.5).astype(bool)
I tried to use df.loc[y_pred_test == True]
, but I get the error:
Boolean index has wrong length: 720 instead of 2880
most likely because the test set is smaller than the whole sample set.
Upvotes: 1
Views: 102
Reputation: 261
The error is because you haven't concatenated your prediction values with the df. You might try this:
y_pred_test = pd.DataFrame(y_pred_test)
X_test = pd.concat([y_test, y_pred_test], axis =1)
This will combine your prediction values with the ground truth. Then you can try the following:
X_test.iloc[y_pred_test == True]
And as you haven't predicted on the whole dataset (df) that's why you are getting the error that the number of rows in y_pred_test are 720 and not 2880.
Upvotes: 0