Reputation: 383
Trying to generate a decision tree in sci-kit learn. I have a CSV file, providing as input to my sci-kit program. When I print the dataset length it is 502, the data set shape is (502, 1).There is only one array.
How do I fit into the decision tree and get a result, not sure if I am doing it correctly, below is my code.
import numpy as np
import pandas as pd
from sklearn import tree
from sklearn.cross_validation import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score
input_file = "output.csv"
# for tab delimited use:
df = pd.read_csv(input_file, header = 0, delimiter = "\t")
# printing the original column values in a python list
print(df.values)
print("DataSet Length :",len(df))
print("DataSet Shape :",df.shape)
# Assigning values to an array
X=df.values[:,0]
# test train the the data
X_train,X_test=train_test_split(X,test_size=0.3,random_state=100)
# Passing to the Decision Tree Classifier, with entropy criterion
clf_entropy = DecisionTreeClassifier(criterion = "entropy", rando
m_state = 100,max_depth=3, min_samples_leaf=5)
# Fitting the data to the classifier
clf_entropy.fit(X_train)
CSV file is on the below link
https://drive.google.com/file/d/0B3XlF206d5UrVnh6QS1LRW0xT0U/view?usp=sharing
Download and open using excel. Referring to the following sci-kit documentation for reference.
Upvotes: 0
Views: 2538
Reputation: 19634
In order to fit a decision tree classifier, your training and testing data needs to have labels. Using these labels, you can fit the tree. Here is an example from sklearn website:
from sklearn import tree
X = [[0, 0], [1, 1]]
Y = [0, 1]
clf = tree.DecisionTreeClassifier()
clf = clf.fit(X, Y)
The problem is that in your code, you have only X
values, without labels (Y
values). So you cannot fit the tree.
Upvotes: 2