tushariyer
tushariyer

Reputation: 958

Fixing an Index Error for a dataframe

I'm trying to build a decision tree classifier, and I have the following code:

def dtree(data, attrs, target):

    data = data[:]
    vals = []

    for entry in data:
        entry_index = attrs.index(target)
        vals.append(entry[entry_index])

    major = majority(data, attrs, target)

    if not data or (len(attrs) - 1) <= 0:
        return major
    elif vals.count(vals[0]) == len(vals):
        return vals[0]
    else:
        pick = choose(data, attrs, target)
        tree = {pick:{}}

        for each in get_vals(data, attrs, pick):
            new_d = get_data(data, attrs, pick, each)
            newAttr = attrs[:]
            newAttr.remove(pick)
            subtree = dtree(new_d, newAttr, target)
            tree[pick][each] = subtree

    return tree

Where:

When I call this method, I get the following error:

File "dtree_classifier.py", line 176, in dtree
   vals.append(entry[entry_index])

IndexError: string index out of range

I'm not sure what about that line is throwing the error and I don't know what I'm supposed to do to diagnose it.

Here's a data example: enter image description here

Upvotes: 0

Views: 487

Answers (1)

Grigoriy Mikhalkin
Grigoriy Mikhalkin

Reputation: 5573

So, error occures in that part of your code:

for entry in data:
    entry_index = attrs.index(target)
    vals.append(entry[entry_index])

I guess, what you want to do here, is to iterate over all rows of data DataFrame and, from every row, add value of column target to list vals. Problem is occurring, because iterating over data returns column names(strings), not rows. So, when you indexing entry string, with index of target column, you get IndexError.

In pandas, there is much better way to get all values of column to list:

data[target].tolist()

Upvotes: 1

Related Questions