Reputation: 20302
I am testing some sample code below. All Classification results are pretty and quite reasonable (80% or more). All Regression results are abysmal, and quite abnormal (around 20%). Why would this be? I must be doing something wrong, but I can't see what is off here.
import pandas as pd
import numpy as np
#reading the dataset
df=pd.read_csv("C:\\my_path\\train.csv")
#filling missing values
df['Gender'].fillna('Male', inplace=True)
df.fillna(0)
df.Loan_Status.replace(('Y', 'N'), (1, 0), inplace=True)
#split dataset into train and test
from sklearn.model_selection import train_test_split
train, test = train_test_split(df, test_size=0.3, random_state=0)
x_train=train.drop(['Loan_Status','Loan_ID'],axis=1)
y_train=train['Loan_Status']
x_test=test.drop(['Loan_Status','Loan_ID'],axis=1)
y_test=test['Loan_Status']
#create dummies
x_train=pd.get_dummies(x_train)
x_test=pd.get_dummies(x_test)
# Baggin Classifier
from sklearn.ensemble import BaggingClassifier
from sklearn import tree
model = BaggingClassifier(tree.DecisionTreeClassifier(random_state=1))
model.fit(x_train, y_train)
model.score(x_test,y_test)
# Bagging Regressor
from sklearn.ensemble import BaggingRegressor
model = BaggingRegressor(tree.DecisionTreeRegressor(random_state=1))
model.fit(x_train, y_train)
model.score(x_test,y_test)
# AdaBoostClassifier
from sklearn.ensemble import AdaBoostClassifier
model = AdaBoostClassifier(random_state=1)
model.fit(x_train, y_train)
model.score(x_test,y_test)
# AdaBoostRegressor
from sklearn.ensemble import AdaBoostRegressor
model = AdaBoostRegressor()
model.fit(x_train, y_train)
model.score(x_test,y_test)
# GradientBoostingClassifier
from sklearn.ensemble import GradientBoostingClassifier
model= GradientBoostingClassifier(learning_rate=0.01,random_state=1)
model.fit(x_train, y_train)
model.score(x_test,y_test)
# GradientBoostingRegressor
from sklearn.ensemble import GradientBoostingRegressor
model= GradientBoostingRegressor()
model.fit(x_train, y_train)
model.score(x_test,y_test)
# XGBClassifier
import xgboost as xgb
model=xgb.XGBClassifier(random_state=1,learning_rate=0.01)
model.fit(x_train, y_train)
model.score(x_test,y_test)
# XGBRegressor
import xgboost as xgb
model=xgb.XGBRegressor()
model.fit(x_train, y_train)
model.score(x_test,y_test)
The sample data is from the link below.
https://www.kaggle.com/wendykan/lending-club-loan-data
Finally, here is a small sample of what I am seeing.
# Bagging Regressor
from sklearn.ensemble import BaggingRegressor
regressor = BaggingRegressor()
regressor.fit(x_train,y_train)
accuracy = regressor.score(x_test,y_test)
print(accuracy*100,'%')
# result:
13.022388059701505 %
from sklearn.linear_model import LinearRegression
regressor = LinearRegression()
regressor.fit(x_train,y_train)
accuracy = regressor.score(x_test,y_test)
print(accuracy*100,'%')
# result:
29.836209522493196 %
Upvotes: 0
Views: 164
Reputation: 90
Regression and clasification are two different tasks. From your code it seems that you are trying to fit regressor with the same data as the classifier. Basically regresors try to find a function that best guesses output number based on input. So the target values should be numbers from continous space and not categories. For instance you may want to predict the loaners income based on the amount of money he borrows.
Check this medium page for more info on difference between regression and classification.
Upvotes: 1