Hezekiah Bodine
Hezekiah Bodine

Reputation: 141

how to eliminate key error with pandas get_dummies function

When I run the pandas get_dummies() function it returns a keyerror stating that all of my columns are nonexistent. The following code uses copyrighted data and I am citing it: UCI Machine Learning Repository's adult dataset cited Dua, D. and Graff, C. (2019). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science.

I am unsure what to try.

age, workclass, fnlwgt, education, education-num, marital-status, occupation, forces, relationship, race, sex, capital-gain, capital-loss, hours-per-week, native-country,
39, State-gov, 77516, Bachelors, 13, Never-married, Adm-clerical, Not-in-family, White, Male, 2174, 0, 40, United-States, <=50K
50, Self-emp-not-inc, 83311, Bachelors, 13, Married-civ-spouse, Exec-managerial, Husband, White, Male, 0, 0, 13, United-States, <=50K
38, Private, 215646, HS-grad, 9, Divorced, Handlers-cleaners, Not-in-family, White, Male, 0, 0, 40, United-States, <=50K
53, Private, 234721, 11th, 7, Married-civ-spouse, Handlers-cleaners, Husband, Black, Male, 0, 0, 40, United-States, <=50K
28, Private, 338409, Bachelors, 13, Married-civ-spouse, Prof-specialty, Wife, Black, Female, 0, 0, 40, Cuba, <=50K
37, Private, 284582, Masters, 14, Married-civ-spouse, Exec-managerial, Wife, White, Female, 0, 0, 40, United-States, <=50K
49, Private, 160187, 9th, 5, Married-spouse-absent, Other-service, Not-in-family, Black, Female, 0, 0, 16, Jamaica, <=50K
52, Self-emp-not-inc, 209642, HS-grad, 9, Married-civ-spouse, Exec-managerial, Husband, White, Male, 0, 0, 45, United-States, >50K
#import modules
import pandas as pd

#define functions
def open_infile():
    d = pd.read_csv('adult.data.txt', sep = ',')
    return d

def onehot_encode(data):
    data = pd.get_dummies(data, columns = ['workclass', 'education', 'marital-status', 'occupation', 'forces',
                                         'relationship', 'race', 'sex', 'native-country'])
    return data
##########gather data##########
#opoen infile
data = open_infile()
print(len(data))

##########process data##########
#one-hot encode categorical columns
onehot_encode(data)
print(data.head())
Traceback (most recent call last):
  File "C:/Users/Hezekiah/PycharmProjects/Artificial Intelligence 0/Chapter 1 Application Adult.py", line 20, in <module>
    onehot_encode(data)
  File "C:/Users/Hezekiah/PycharmProjects/Artificial Intelligence 0/Chapter 1 Application Adult.py", line 11, in onehot_encode
    'relationship', 'race', 'sex', 'native-country'])
  File "C:\Users\Hezekiah\PycharmProjects\Artificial Intelligence 0\venv\lib\site-packages\pandas\core\reshape\reshape.py", line 812, in get_dummies
    data_to_encode = data[columns]
  File "C:\Users\Hezekiah\PycharmProjects\Artificial Intelligence 0\venv\lib\site-packages\pandas\core\frame.py", line 2934, in __getitem__
    raise_missing=True)
  File "C:\Users\Hezekiah\PycharmProjects\Artificial Intelligence 0\venv\lib\site-packages\pandas\core\indexing.py", line 1354, in _convert_to_indexer
    return self._get_listlike_indexer(obj, axis, **kwargs)[1]
  File "C:\Users\Hezekiah\PycharmProjects\Artificial Intelligence 0\venv\lib\site-packages\pandas\core\indexing.py", line 1161, in _get_listlike_indexer
    raise_missing=raise_missing)
  File "C:\Users\Hezekiah\PycharmProjects\Artificial Intelligence 0\venv\lib\site-packages\pandas\core\indexing.py", line 1246, in _validate_read_indexer
    key=key, axis=self.obj._get_axis_name(axis)))
KeyError: "None of [Index(['workclass', 'education', 'marital-status', 'occupation', 'forces',\n       'relationship', 'race', 'sex', 'native-country'],\n      dtype='object')] are in the [columns]"

I expect pandas get_dummies() function to convert all categorical attributes into numerical ones, but instead pycharm is returning a keyerror that tells me that none of my columns exist, when clearly they do.

Upvotes: 3

Views: 5089

Answers (2)

Akhilesh_IN
Akhilesh_IN

Reputation: 1327

your main problem is your data while merging adult.names with adult.data file There is no forces columns in website data you mentioned. if you merge data correctly you will not get this error too.

Even you are using this column for making dummies too.

Upvotes: 1

jezrael
jezrael

Reputation: 863166

There is problem with trailing spaces in columns names, solution is use str.strip :

data.columns = data.columns.str.strip()

Or list comprehension with strip:

data.columns = [x.strip() for x in data.columns]

Upvotes: 3

Related Questions