yukichia
yukichia

Reputation: 43

ValueError: Number of labels=34866 does not match number of samples=2

I am trying to run Decision Tree Classifier but I face this problem.Please can you explain me how do I fix this Error?My English isn’t very good but I will try to understand!I'm just starting to learn the program, so please point me out if there's anything that isn't good enough.thank you!

import matplotlib.pyplot as plt
from sklearn.tree import DecisionTreeClassifier 
from sklearn import tree
import pandas as pd

sale=pd.read_csv('Online_Sale.csv')
plt.rcParams['font.sans-serif'] = ['Microsoft JhengHei']
sale['回購'] = sale['回購'].apply(lambda x: 1 if x == 'Y' else 0)
sale['單位售價'] = sale['單位售價'].str.replace(',', '').astype(float)
x=sale['年紀'],sale['單位售價']
y=sale['回購']

print(x)
print(y)

clf = DecisionTreeClassifier(random_state=0)
model = clf.fit(x, y)

text_representation = tree.export_text(clf)
print(text_representation)

fig = plt.figure(figsize=(15,12))
tree.plot_tree(clf, 
              filled=True)
fig.savefig("decistion_tree.png")

data:

enter image description here

I looked for a lot of different approaches, but I didn't have a way to fully understand what the problem is...

Upvotes: 4

Views: 100

Answers (1)

rickhg12hs
rickhg12hs

Reputation: 11942

There is just a small error with:

x=sale['年紀'],sale['單位售價']

Rather than selecting the columns you want, this creates a tuple of the columns, hence the end of the error message ... does not match number of samples=2

One way to create a new pd.DataFrame with your selected columns:

x=sale[['年紀', '單位售價']]

Upvotes: 2

Related Questions