ValueError using sklearn and pandas for decision trees?

Question

I'm new to scikit learn and I just saw the documentation and a couple of other stackoverflow posts to build a decision tree. I have a CSV data set with 16 attributes and 1 target label. How should I pass it into the decision tree classifier? My current code looks like this:

import pandas
import sklearn
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn import tree

data = pandas.read_csv("yelp_atlanta_data_labelled.csv", sep=',')
vect = TfidfVectorizer()
X = vect.fit_transform(data) 
Y = data['go']

clf = tree.DecisionTreeClassifier()
clf = clf.fit(X, Y)

When I run the code it gives me the following error:

ValueError: Number of labels=501 does not match number of samples=17

To give some context, my data set has 501 data points and 17 total columns. The go column is the target column with yes/no labels.

ValueError using sklearn and pandas for decision trees?

Answers (1)

Related Questions