Fixing an Index Error for a dataframe

Question

I'm trying to build a decision tree classifier, and I have the following code:

def dtree(data, attrs, target):

    data = data[:]
    vals = []

    for entry in data:
        entry_index = attrs.index(target)
        vals.append(entry[entry_index])

    major = majority(data, attrs, target)

    if not data or (len(attrs) - 1) <= 0:
        return major
    elif vals.count(vals[0]) == len(vals):
        return vals[0]
    else:
        pick = choose(data, attrs, target)
        tree = {pick:{}}

        for each in get_vals(data, attrs, pick):
            new_d = get_data(data, attrs, pick, each)
            newAttr = attrs[:]
            newAttr.remove(pick)
            subtree = dtree(new_d, newAttr, target)
            tree[pick][each] = subtree

    return tree

Where:

data is a pandas dataframe of my training data (33582 x 21),
attrs is a list of the dataframe headers,
target is the string name of the target attribute.
vals is a list

When I call this method, I get the following error:

File "dtree_classifier.py", line 176, in dtree
   vals.append(entry[entry_index])

IndexError: string index out of range

I'm not sure what about that line is throwing the error and I don't know what I'm supposed to do to diagnose it.

Here's a data example:

Grigoriy Mikhalkin · Accepted Answer

So, error occures in that part of your code:

for entry in data:
    entry_index = attrs.index(target)
    vals.append(entry[entry_index])

I guess, what you want to do here, is to iterate over all rows of data DataFrame and, from every row, add value of column target to list vals. Problem is occurring, because iterating over data returns column names(strings), not rows. So, when you indexing entry string, with index of target column, you get IndexError.

In pandas, there is much better way to get all values of column to list:

data[target].tolist()

Fixing an Index Error for a dataframe

Answers (1)

Related Questions