Reputation: 159
I want to predict an electric consumption by using randomforest . after making regulation on data , latest status is as below
X=df[['Temp(⁰C)','Araç Sayısı (adet)','Montaj V362_WH','Montaj V363_WH','Montaj_Temp','avg_humidity']]
X.head(15)
Output:
Temp(⁰C) Araç Sayısı (adet) Montaj V362_WH Montaj V363_WH Montaj_Temp avg_humidity
0 3.250000 0.0 0.0 0.0 17.500000 88.250000
1 3.500000 868.0 16.0 18.0 20.466667 82.316667
2 3.958333 774.0 18.0 18.0 21.166667 87.533333
3 6.541667 0.0 0.0 0.0 18.900000 83.916667
4 4.666667 785.0 16.0 18.0 20.416667 72.650000
5 2.458333 813.0 18.0 18.0 21.166667 73.983333
6 -0.458333 804.0 16.0 18.0 20.500000 72.150000
7 -1.041667 850.0 16.0 16.0 19.850000 76.433333
8 -0.375000 763.0 16.0 18.0 20.500000 76.583333
9 4.375000 1149.0 16.0 16.0 21.416667 84.300000
10 8.541667 0.0 0.0 0.0 21.916667 71.650000
11 6.625000 763.0 16.0 18.0 22.833333 73.733333
12 5.333333 783.0 16.0 16.0 22.166667 69.250000
13 4.708333 764.0 16.0 18.0 21.583333 66.800000
14 4.208333 813.0 16.0 16.0 20.750000 68.150000
y.head(15)
Output:
Montaj_ET_kWh/day
0 11951.0
1 41821.0
2 42534.0
3 14537.0
4 41305.0
5 42295.0
6 44923.0
7 44279.0
8 45752.0
9 44432.0
10 25786.0
11 42203.0
12 40676.0
13 39980.0
14 39404.0
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.30, random_state=None)
clf = RandomForestRegressor(n_estimators=10000, random_state=0, n_jobs=-1)
clf.fit(X_train, y_train['Montaj_ET_kWh/day'])
for feature in zip(feature_list, clf.feature_importances_):
print(feature)
OUTPUT
('Temp(⁰C)', 0.11598075020423881)
('Araç Sayısı (adet)', 0.7047301384616493)
('Montaj V362_WH', 0.04065706901940535)
('Montaj V363_WH', 0.023077554218712878)
('Montaj_Temp', 0.08082006262985514)
('avg_humidity', 0.03473442546613837)
sfm = SelectFromModel(clf, threshold=0.10)
sfm.fit(X_train, y_train['Montaj_ET_kWh/day'])
for feature_list_index in sfm.get_support(indices=True):
print(feature_list[feature_list_index])
OUTPUT:
Temp(⁰C)
Araç Sayısı (adet)
X_important_train = sfm.transform(X_train)
X_important_test = sfm.transform(X_test)
clf_important = RandomForestRegressor(n_estimators=10000, random_state=0, n_jobs=-1)
clf_important.fit(X_important_train, y_train)
y_test=y_test.values
y_pred = clf.predict(X_test)
y_test=y_test.reshape(-1,1)
y_pred=y_pred.reshape(-1,1)
y_test=y_test.ravel()
y_pred=y_pred.ravel()
label_encoder = LabelEncoder()
y_pred = label_encoder.fit_transform(y_pred)
y_test = label_encoder.fit_transform(y_test)
accuracy_score(y_test, y_pred)
output :
0.010964912280701754
I have no idea why accuracy was too low , any idea where I made mistake
Upvotes: 0
Views: 1856
Reputation: 60390
Your mistake is that you are asking for accuracy (a classification metric) in a regression setting, which is meaningless.
From the accuracy_score
documentation (emphasis added):
sklearn.metrics.accuracy_score
(y_true, y_pred, normalize=True, sample_weight=None)Accuracy classification score.
Check the list of metrics available in scikit-learn for suitable regression metrics (where you can also confirm that accuracy is used only in classification); for more details, see my answer in Accuracy Score ValueError: Can't Handle mix of binary and continuous target
Upvotes: 3