Reputation: 97
The 2 codes below should IMO deliver exactly the same output, but they don't, even though the results differ only marginally. The train_test split is fixed with a specified random_state which AFAIU should guarantee reproducible results. The only code difference is that code#0 uses an explicit variable for the decision tree model.
Code #0
from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split
import numpy as np
from sklearn.tree import DecisionTreeRegressor
from sklearn.metrics import mean_squared_error
boston = load_boston()
y = boston.target
X = boston.data
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.33, random_state=42)
DT_regressor = DecisionTreeRegressor()
DT_model = DT_regressor.fit(X_train, y_train)
y_DT_pred = DT_model.predict(X_test)
def mse(actual, preds):
delta = np.sum((actual-preds)*(actual-preds))
return delta/len(preds)
# Check your solution matches sklearn
print('decision trees')
print(mse(y_test, y_DT_pred))
print(mean_squared_error(y_test, y_DT_pred))
print("If the above match, you are all set!")
print('predicted')
print(y_DT_pred)
print('labels')
print(y_test)
Code #1
from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split
import numpy as np
from sklearn.ensemble import RandomForestRegressor, AdaBoostRegressor
from sklearn.linear_model import LinearRegression
from sklearn.tree import DecisionTreeRegressor
from sklearn.metrics import r2_score, mean_squared_error, mean_absolute_error
boston = load_boston()
y = boston.target
X = boston.data
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.33, random_state=42)
tree_mod = DecisionTreeRegressor()
tree_mod.fit(X_train, y_train)
preds_tree = tree_mod.predict(X_test)
def mse(actual, preds):
return np.sum((actual-preds)**2)/len(actual)
# Check your solution matches sklearn
print(mse(y_test, preds_tree))
print(mean_squared_error(y_test, preds_tree))
print("If the above match, you are all set!")
print('predicted')
print(preds_tree)
print('labels')
print(y_test)
Even after changing random_state=0, there are differences.
output of code#0
26.12281437125748
26.12281437125748
If the above match, you are all set!
predicted
[23.4 24.5 20.1 11.7 20.7 20.4 21.8 20.5 22.7 16.1 10.8 17.9 14.9 8.8
50. 37. 21.2 32.7 28. 18.9 23.1 22.7 23.1 24.8 19.7 10.9 19.3 13.1
37.6 18.4 12.5 17.7 24.5 23.1 23.2 17.7 8.3 19.5 12.7 17.9 22.9 19.7
23.9 12.5 22. 20.5 22.4 13.8 15.6 28.7 13.8 18.3 18.2 35.2 19. 22.4
21.7 20.7 10.9 19.5 20.6 23.1 34.9 30.1 17.7 32. 16.1 18.9 16.7 21.7
20.6 23.8 23.2 33.1 28.4 8.8 41.7 23.1 22. 21.8 27.1 19.3 20.2 37.6
37.6 25. 19.3 13.8 24.3 14.3 17.5 11.8 23.1 35.1 21.6 23.8 10.2 20.7
14.3 23.1 25. 20.1 33.8 24.5 25. 23.1 8.3 19.5 23.8 22. 23.6 17.9
18.9 18.3 20. 20. 9.5 14.5 9.5 50. 32. 6.3 14.4 21.7 25. 17.3
34.9 22.5 18.9 36.1 12.5 9.5 15.2 19.6 10.5 34.9 20. 15.6 28.6 8.3
10.9 21.8 23.6 24.4 24.2 14.5 37.3 37.3 12.8 6.3 28.4 25. 15.6 32.4
17.4 23.7 17.3 19.7 21.8 13.1 8.3 17.5 34.9 31.6 31. 23.1 23.1]
labels
[22.6 50. 23. 8.3 21.2 19.9 20.6 18.7 16.1 18.6 8.8 17.2 14.9 10.5
50. 29. 23. 33.3 29.4 21. 23.8 19.1 20.4 29.1 19.3 23.1 19.6 19.4
38.7 18.7 14.6 20. 20.5 20.1 23.6 16.8 5.6 50. 14.5 13.3 23.9 20.
19.8 13.8 16.5 21.6 20.3 17. 11.8 27.5 15.6 23.1 24.3 42.8 15.6 21.7
17.1 17.2 15. 21.7 18.6 21. 33.1 31.5 20.1 29.8 15.2 15. 27.5 22.6
20. 21.4 23.5 31.2 23.7 7.4 48.3 24.4 22.6 18.3 23.3 17.1 27.9 44.8
50. 23. 21.4 10.2 23.3 23.2 18.9 13.4 21.9 24.8 11.9 24.3 13.8 24.7
14.1 18.7 28.1 19.8 26.7 21.7 22. 22.9 10.4 21.9 20.6 26.4 41.3 17.2
27.1 20.4 16.5 24.4 8.4 23. 9.7 50. 30.5 12.3 19.4 21.2 20.3 18.8
33.4 18.5 19.6 33.2 13.1 7.5 13.6 17.4 8.4 35.4 24. 13.4 26.2 7.2
13.1 24.5 37.2 25. 24.1 16.6 32.9 36.2 11. 7.2 22.8 28.7 14.4 24.4
18.1 22.5 20.5 15.2 17.4 13.6 8.7 18.2 35.4 31.7 33. 22.2 20.4]
output of code#1
28.135568862275445
28.135568862275445
If the above match, you are all set!
predicted
[23.1 24.5 20.1 19.1 20.7 20.4 21.8 19. 21.8 16.1 10.8 17.9 14.9 8.8
50. 37. 21.2 32.7 24.5 18.9 23.1 21.5 20.1 24.8 19.7 10.9 19.3 15.6
37.6 18.8 12.5 19.1 24.5 23.1 23.9 17.7 7. 19.5 12.7 17.9 22.9 19.7
23.9 12.5 22. 20.5 22.5 13.3 15.6 28.4 13.3 18.4 18.2 21.9 18.4 22.4
21.7 20.7 10.9 19.3 19.4 23.1 35.1 30.1 19.1 32. 16.1 18.9 16.7 21.7
20.6 23.8 23.7 33.1 28.6 7.2 41.7 23.1 22. 21.7 27.1 19.2 20.2 37.6
37.6 25. 19.3 13.8 24.3 14.3 17.5 11.8 23.2 34.9 21.6 23.8 10.9 22.3
14.3 23.1 25. 20.1 30.3 24.5 21. 23.1 8.3 19.9 23.8 22. 23.6 17.9
20. 18.4 18.9 20.7 9.5 14.5 10.2 50. 32. 6.3 14.4 21.7 25. 17.4
34.9 22.5 18.9 37.3 12.7 9.5 15.2 19.6 10.8 34.9 22.2 15.6 28.6 7.
10.9 21.7 23.6 24.4 24.2 16. 37.3 37.3 12.8 8.8 28.6 25.3 14.3 32.5
17.4 23.7 17.4 19.9 21.7 12.7 7. 17.6 35.1 31.5 30.3 23.1 22.1]
labels
[22.6 50. 23. 8.3 21.2 19.9 20.6 18.7 16.1 18.6 8.8 17.2 14.9 10.5
50. 29. 23. 33.3 29.4 21. 23.8 19.1 20.4 29.1 19.3 23.1 19.6 19.4
38.7 18.7 14.6 20. 20.5 20.1 23.6 16.8 5.6 50. 14.5 13.3 23.9 20.
19.8 13.8 16.5 21.6 20.3 17. 11.8 27.5 15.6 23.1 24.3 42.8 15.6 21.7
17.1 17.2 15. 21.7 18.6 21. 33.1 31.5 20.1 29.8 15.2 15. 27.5 22.6
20. 21.4 23.5 31.2 23.7 7.4 48.3 24.4 22.6 18.3 23.3 17.1 27.9 44.8
50. 23. 21.4 10.2 23.3 23.2 18.9 13.4 21.9 24.8 11.9 24.3 13.8 24.7
14.1 18.7 28.1 19.8 26.7 21.7 22. 22.9 10.4 21.9 20.6 26.4 41.3 17.2
27.1 20.4 16.5 24.4 8.4 23. 9.7 50. 30.5 12.3 19.4 21.2 20.3 18.8
33.4 18.5 19.6 33.2 13.1 7.5 13.6 17.4 8.4 35.4 24. 13.4 26.2 7.2
13.1 24.5 37.2 25. 24.1 16.6 32.9 36.2 11. 7.2 22.8 28.7 14.4 24.4
18.1 22.5 20.5 15.2 17.4 13.6 8.7 18.2 35.4 31.7 33. 22.2 20.4]
Upvotes: 0
Views: 275
Reputation: 1869
The model itself has also a random component. So fixing just the split won't be enough. Try to set
DecisionTreeRegressor(random_state=0)
as well.
If that doesn't help it would be useful if you post your results.
Upvotes: 1