Reputation: 43
I am trying to run Decision Tree Classifier but I face this problem.Please can you explain me how do I fix this Error?My English isn’t very good but I will try to understand!I'm just starting to learn the program, so please point me out if there's anything that isn't good enough.thank you!
import matplotlib.pyplot as plt
from sklearn.tree import DecisionTreeClassifier
from sklearn import tree
import pandas as pd
sale=pd.read_csv('Online_Sale.csv')
plt.rcParams['font.sans-serif'] = ['Microsoft JhengHei']
sale['回購'] = sale['回購'].apply(lambda x: 1 if x == 'Y' else 0)
sale['單位售價'] = sale['單位售價'].str.replace(',', '').astype(float)
x=sale['年紀'],sale['單位售價']
y=sale['回購']
print(x)
print(y)
clf = DecisionTreeClassifier(random_state=0)
model = clf.fit(x, y)
text_representation = tree.export_text(clf)
print(text_representation)
fig = plt.figure(figsize=(15,12))
tree.plot_tree(clf,
filled=True)
fig.savefig("decistion_tree.png")
data:
I looked for a lot of different approaches, but I didn't have a way to fully understand what the problem is...
Upvotes: 4
Views: 100
Reputation: 11942
There is just a small error with:
x=sale['年紀'],sale['單位售價']
Rather than selecting the columns you want, this creates a tuple of the columns, hence the end of the error message ... does not match number of samples=2
One way to create a new pd.DataFrame
with your selected columns:
x=sale[['年紀', '單位售價']]
Upvotes: 2