Logistic regression, second column of confusion matrix shows zeros

I wanted to use logistic regression to see the correlation between balance of the bank account, age of the person and ability to buy a house. After implementation of my regression model I'm getting the confusion matrix of type:

array([[1006,    0],
   [ 125,    0]])

This was the case, when I tried to implement the linear regression on other data. Here is the code:

# importing dataset
dataset = pd.read_csv('/home/stayal0ne/Machine-learning/datasets/bank.csv', sep=';')
dataset['age'] = dataset['age'].astype(float)
dataset['balance'] = dataset['balance'].astype(float)
X = dataset.iloc[:, [0, 5]].values
y = dataset.iloc[:, -1].values

# splitting the dataset into the training and test sets
X_train, X_test, y_train, y_test = train_test_split(
        X, y, test_size=0.25, random_state=42)

# encoding categorial data
label_encoder_y = LabelEncoder()
y = label_encoder_y.fit_transform(y)

# feature scaling
scale = StandardScaler()
X_train = scale.fit_transform(X_train)
X_test = scale.transform(X_test)

# Fitting classifier into the training set
classifier = LogisticRegression(random_state=42)
classifier.fit(X_train, y_train)

# Prediction
y_predicted = classifier.predict(X_test)

# Checking the accuracy
con_matrix = confusion_matrix(y_test, y_predicted)

Any help will be appreciated.

Upvotes: 3

Answers (3)

Anique Azhar

Reputation: 51

add this line

y_predicted = np.round(y_predicted)

before this

con_matrix = confusion_matrix(y_test, y_predicted)

& run it again

Upvotes: 0

AntoineP

Reputation: 3073

The documentation of the confusion matrix is :

By definition, entry i, j in a confusion matrix is the number of observations actually in group i, but predicted to be in group j.

So that, in your example, you have 1006 samples of class 0 predicted to be in the class 0, and 125 samples of class 1 predicted to be in the class 0.

It means that your model predicts every sample of your test set in your class 0.

Upvotes: 0

Eliethesaiyan

Reputation: 2322

the array from con_matrix is as follow , tn, fp, fn, tp.

your true negative are 1006, meaning people that the model consider that aren't able to buy a house, and your false positive is 0, meaning that your model didn't predict that someone is able to buy a house while the can't in reality.

Your false negative is 125,meaning that these people in reality they can afford to buy a house but your model is saying they can. and your true positive is also 0, meaning that your model didn't correctly predict the person who can afford a to buy a house as someone who actually can.

MY overall guess is that you might have a lot of people who can't buy a house compare to those who can and probably the features(balance in the bank, age ) are similar to both.

I would advise you to add the class_weight parameters in case you the dataset is imbalanced, if the class label are 0 for not able to buy a house, then set {0: 0.1} in case you have 90 records of not able to buy a house and 10 records of being able to buy a house

Upvotes: 2

Logistic regression, second column of confusion matrix shows zeros

Answers (3)

Related Questions