Reputation: 478
I need to do a decision tree and represent the data by labels on a graph (as showed in the two illustrations). I have no problem for the decision tree, unfortunately, the dots are not imput in the graph. I have tried several changes in the code. The code come from the scikit learn website Plot the decision surface of a decision tree on the iris dataset
There is an example of the data used (X,Y, C5) on below (from an excel file) :
Path = "Documents/Apprentissage/Python/Script/ClustAllRepres12.xlsx"
Wordbook = xlrd.open_workbook(Path)
Sheet = Wordbook.sheet_by_index(0)
X=[]
Y=[]
C5=[]
for i in range(1, Sheet.nrows):
X.append(Sheet.cell_value(i, 0))
Y.append(Sheet.cell_value(i, 1))
C5.append(Sheet.cell_value(i, 8))
X
Out[]: [8.0, 9.0, 9.0, 9.0, 9.0, 10.0, 10.0, 11.0, 11.0, 11.0, 11.0, 11.0, 12.0, 12.0, 12.0, 12.0, 12.0, 13.0, 13.0, 13.0, 14.0, 14.0, 14.0, 15.0, 15.0, 15.0, 15.0, 16.0, 16.0, 16.0, 16.0, 17.0, 17.0, 17.0, 17.0, 18.0, 18.0, 18.0, 18.0, 18.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 20.0, 21.0, 21.0, 22.0]
type(X)
Out[]: list
Y
Out[]: [45.0, 17.0, 18.0, 24.0, 25.0, 27.0, 36.0, 38.0, 39.0, 24.0, 37.0, 40.0, 24.0, 31.0, 35.0, 36.0, 37.0, 39.0, 32.0, 33.0, 35.0, 43.0, 27.0, 31.0, 35.0, 42.0, 18.0, 39.0, 43.0, 31.0, 42.0, 28.0, 32.0, 35.0, 43.0, 51.0, 52.0, 17.0, 19.0, 53.0, 49.0, 51.0, 53.0, 58.0, 16.0, 58.0, 59.0, 50.0, 52.0, 54.0]
type(Y)
Out[]: list
C5
Out[]: [2.0, 4.0, 3.0, 3.0, 3.0, 4.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 1.0, 4.0, 1.0, 3.0, 1.0, 1.0, 1.0, 1.0, 3.0, 1.0, 3.0, 1.0, 5.0, 1.0, 3.0, 1.0, 1.0, 4.0, 4.0, 4.0, 1.0, 5.0, 1.0, 5.0, 2.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0]
type(C5)
Out[]: list
There is the code from the scikit learn website:
import pandas as pd
import numpy as np
import xlrd
import matplotlib.pyplot as plt
df_list = pd.DataFrame(
{'X': X,
'Y': Y,
})
df = df_list.iloc[:,0:2].values #transform to float type
import sklearn
from sklearn.tree import DecisionTreeClassifier, plot_tree
#parameters
n_classes = 5
plot_colors = "ryb"
plot_step = 0.02
for pairidx, pair in enumerate([[0, 1], [0, 2], [0, 3],[1, 2], [1, 3], [2, 3]]):
# We only take the two corresponding features
X = df #gives better result without [:, pair]
y = C5
#train
clf = DecisionTreeClassifier().fit(X, y)
#plot the decision boundary
plt.subplot(2, 3, pairidx +1)
x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, plot_step),
np.arange(y_min, y_max, plot_step))
plt.tight_layout(h_pad=0.5, w_pad=0.5, pad=2.5)
Z = clf.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)
cs = plt.contourf(xx, yy, Z, cmap=plt.cm.RdYlBu)
# Plot the training points
for i, color in zip(range(n_classes), plot_colors):
idx = np.where(y == i)
plt.scatter(X[idx, 0], X[idx, 1], c=color, label=C5[i],
cmap=plt.cm.RdYlBu, edgecolor='black', s=15)
plt.suptitle("Decision surface of a decision tree using paired features")
plt.legend(loc='lower right', borderpad=0, handletextpad=0)
plt.axis("tight")
plt.figure()
clf = DecisionTreeClassifier().fit(df_list, C5)
plot_tree(clf, filled=True)
plt.show()
There is the graph I have :
As you can see, the dots are not represented in the graphs. I give an example of the the awaited result :
Upvotes: 1
Views: 1249
Reputation: 702
idx = np.where(y == i)
In this line y is a list In order to return the index of of list which equals to i(variable) Just convert to the list to array everything will works fine
So just add this line before plotting scatter plot
**y = np.asarray(y)** # convert list to array since where will work for array only
# Plot the training points
for i, color in .....
Upvotes: 0