Reputation: 77
I wanted to use logistic regression to see the correlation between balance of the bank account, age of the person and ability to buy a house. After implementation of my regression model I'm getting the confusion matrix of type:
array([[1006, 0],
[ 125, 0]])
This was the case, when I tried to implement the linear regression on other data. Here is the code:
# importing dataset
dataset = pd.read_csv('/home/stayal0ne/Machine-learning/datasets/bank.csv', sep=';')
dataset['age'] = dataset['age'].astype(float)
dataset['balance'] = dataset['balance'].astype(float)
X = dataset.iloc[:, [0, 5]].values
y = dataset.iloc[:, -1].values
# splitting the dataset into the training and test sets
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.25, random_state=42)
# encoding categorial data
label_encoder_y = LabelEncoder()
y = label_encoder_y.fit_transform(y)
# feature scaling
scale = StandardScaler()
X_train = scale.fit_transform(X_train)
X_test = scale.transform(X_test)
# Fitting classifier into the training set
classifier = LogisticRegression(random_state=42)
classifier.fit(X_train, y_train)
# Prediction
y_predicted = classifier.predict(X_test)
# Checking the accuracy
con_matrix = confusion_matrix(y_test, y_predicted)
Any help will be appreciated.
Upvotes: 3
Views: 3158
Reputation: 51
add this line
y_predicted = np.round(y_predicted)
before this
con_matrix = confusion_matrix(y_test, y_predicted)
& run it again
Upvotes: 0
Reputation: 3073
The documentation of the confusion matrix is :
By definition, entry i, j in a confusion matrix is the number of observations actually in group i, but predicted to be in group j.
So that, in your example, you have 1006 samples of class 0 predicted to be in the class 0, and 125 samples of class 1 predicted to be in the class 0.
It means that your model predicts every sample of your test set in your class 0.
Upvotes: 0
Reputation: 2322
the array from con_matrix is as follow , tn, fp, fn, tp.
your true negative are 1006, meaning people that the model consider that aren't able to buy a house, and your false positive is 0, meaning that your model didn't predict that someone is able to buy a house while the can't in reality.
Your false negative is 125,meaning that these people in reality they can afford to buy a house but your model is saying they can. and your true positive is also 0, meaning that your model didn't correctly predict the person who can afford a to buy a house as someone who actually can.
MY overall guess is that you might have a lot of people who can't buy a house compare to those who can and probably the features(balance in the bank, age ) are similar to both.
I would advise you to add the class_weight parameters in case you the dataset is imbalanced, if the class label are 0 for not able to buy a house, then set {0: 0.1} in case you have 90 records of not able to buy a house and 10 records of being able to buy a house
Upvotes: 2