Reputation: 958
I'm trying to build a decision tree classifier, and I have the following code:
def dtree(data, attrs, target):
data = data[:]
vals = []
for entry in data:
entry_index = attrs.index(target)
vals.append(entry[entry_index])
major = majority(data, attrs, target)
if not data or (len(attrs) - 1) <= 0:
return major
elif vals.count(vals[0]) == len(vals):
return vals[0]
else:
pick = choose(data, attrs, target)
tree = {pick:{}}
for each in get_vals(data, attrs, pick):
new_d = get_data(data, attrs, pick, each)
newAttr = attrs[:]
newAttr.remove(pick)
subtree = dtree(new_d, newAttr, target)
tree[pick][each] = subtree
return tree
Where:
data
is a pandas
dataframe of my training data (33582 x 21)
, attrs
is a list of the dataframe headers, target
is the string name of the target attribute.vals
is a listWhen I call this method, I get the following error:
File "dtree_classifier.py", line 176, in dtree
vals.append(entry[entry_index])
IndexError: string index out of range
I'm not sure what about that line is throwing the error and I don't know what I'm supposed to do to diagnose it.
Upvotes: 0
Views: 487
Reputation: 5573
So, error occures in that part of your code:
for entry in data:
entry_index = attrs.index(target)
vals.append(entry[entry_index])
I guess, what you want to do here, is to iterate over all rows of data
DataFrame and, from every row, add value of column target
to list vals
. Problem is occurring, because iterating over data returns column names(strings), not rows. So, when you indexing entry
string, with index of target
column, you get IndexError
.
In pandas, there is much better way to get all values of column to list:
data[target].tolist()
Upvotes: 1