OneHotEncoding raises IndexError: arrays used as indices must be of integer (or boolean) type

Question

I have a dataframe named data that has the below given properties:

[880 rows x 10 columns]

MultiIndex: 880 entries, (123, 456) to (789, 890)
Data columns (total 10 columns):
Date_Diff            880 non-null float64
Response             880 non-null category
Len1                 880 non-null int64
Type1                877 non-null category
Len2                 880 non-null int64
Type2                880 non-null category
Len_Diff             880 non-null int64
Same_Institution     880 non-null category
Same_Type            880 non-null category
Score                880 non-null float64
dtypes: category(5), float64(2), int64(3)
memory usage: 82.0+ KB
None

Note: The indices on the dataframe are string columns called ID1 and ID2. This is how I set the multiindex: data = data.set_index(['ID1','ID2'], drop = True). Since drop = True, you won't see them in the above dataframe.

I am trying to encode the categorical variables Type1 and Type2 using LabelEncoder and OneHotEncoder. This is my code:

# Encoding function
def encode(data):
    global cat_columns
    cat_columns = list(data.select_dtypes(include=['category','object']))
    le = LabelEncoder()
    ohe = OneHotEncoder(categorical_features = cat_columns)
    for col in cat_columns:
        data[col] = le.fit_transform(data[col])
    data = ohe.fit_transform(data)
    return data

# Use encoding function
encode(data)

I get an IndexError when I run this code. The error is:

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
 in ()
     14     return data
     15 
---> 16 encode(data)

 in encode(data)
---> 13     data = ohe.fit_transform(data)
     14     return data
     15 

/Users/username/anaconda2/lib/python2.7/site-packages/sklearn/preprocessing/data.pyc in fit_transform(self, X, y)
   1900         """
   1901         return _transform_selected(X, self._fit_transform,
-> 1902                                    self.categorical_features, copy=True)
   1903 
   1904     def _transform(self, X):

/Users/username/anaconda2/lib/python2.7/site-packages/sklearn/preprocessing/data.pyc in _transform_selected(X, transform, selected, copy)
   1706     ind = np.arange(n_features)
   1707     sel = np.zeros(n_features, dtype=bool)
-> 1708     sel[np.asarray(selected)] = True
   1709     not_sel = np.logical_not(sel)
   1710     n_selected = np.sum(sel)

IndexError: arrays used as indices must be of integer (or boolean) type

What is causing this error?
I tried removing IDs as indices and tried, still throws the same error.

EDIT: Adding a subset of the dataframe here: Run the html snippet to see it as a table.
Some of the columns' data types have been changed since. The data types are updated in the dataframe properties above.
Response is the target variable and is categorical.
Same_Institution and Same_Type have been changed from integers to categorical binary variables
Type1 and Type2 have been changed from pandas objects to categories

ID1 ID2 Len1 Type1 Len2 Type2 Len_Diff Date_Diff Same_Institution Same_Type Score Response
121 977 10185 PR 10185 MR 0 0 0 0 1 1
214 753 5039 MR 4926 MR 113 9.266666667 0 1 0.997031978 1
378 919 45404 PR 45404 PR 0 0 0 1 1 1
283 685 821076 40-F 412353 AR 408723 0.35 0 0 0.888266653 0
452 837 16343 PR 16343 PR 0 0 0 1 1 1
333 726 22204 PR 20897 6-K 1307 11.3 0 0 0.99251128 1
107 960 9781 6-K 6073 MR 3708 0.483333333 0 0 0.933646747 0
236 768 3375 PR 2945 MR 430 46.58333333 0 0 0.239269675 0
419 829 81247 MR 81247 MR 0 0.016666667 0 1 1 1
184 991 51474 PR 51474 ER 0 0 0 0 1 1
217 868 23714 ER 26633 8-K 2919 1.716666667 0 0 0.980611207 1
202 622 4638 MR 4638 PR 0 0 0 0 1 1
308 883 73476 ER 404584 6-K 331108 12.58333333 0 0 0.825482503 0
186 880 291279 FIN SUPP 320893 6-K 29614 4.483333333 0 0 0.991668299 1
305 896 22988 PR 28554 6-K 5566 22.1 0 0 0.941192693 0

OneHotEncoding raises IndexError: arrays used as indices must be of integer (or boolean) type

Answers (1)

Related Questions