Reputation: 141
When I run the pandas get_dummies() function it returns a keyerror stating that all of my columns are nonexistent. The following code uses copyrighted data and I am citing it: UCI Machine Learning Repository's adult dataset cited Dua, D. and Graff, C. (2019). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science.
I am unsure what to try.
age, workclass, fnlwgt, education, education-num, marital-status, occupation, forces, relationship, race, sex, capital-gain, capital-loss, hours-per-week, native-country,
39, State-gov, 77516, Bachelors, 13, Never-married, Adm-clerical, Not-in-family, White, Male, 2174, 0, 40, United-States, <=50K
50, Self-emp-not-inc, 83311, Bachelors, 13, Married-civ-spouse, Exec-managerial, Husband, White, Male, 0, 0, 13, United-States, <=50K
38, Private, 215646, HS-grad, 9, Divorced, Handlers-cleaners, Not-in-family, White, Male, 0, 0, 40, United-States, <=50K
53, Private, 234721, 11th, 7, Married-civ-spouse, Handlers-cleaners, Husband, Black, Male, 0, 0, 40, United-States, <=50K
28, Private, 338409, Bachelors, 13, Married-civ-spouse, Prof-specialty, Wife, Black, Female, 0, 0, 40, Cuba, <=50K
37, Private, 284582, Masters, 14, Married-civ-spouse, Exec-managerial, Wife, White, Female, 0, 0, 40, United-States, <=50K
49, Private, 160187, 9th, 5, Married-spouse-absent, Other-service, Not-in-family, Black, Female, 0, 0, 16, Jamaica, <=50K
52, Self-emp-not-inc, 209642, HS-grad, 9, Married-civ-spouse, Exec-managerial, Husband, White, Male, 0, 0, 45, United-States, >50K
#import modules
import pandas as pd
#define functions
def open_infile():
d = pd.read_csv('adult.data.txt', sep = ',')
return d
def onehot_encode(data):
data = pd.get_dummies(data, columns = ['workclass', 'education', 'marital-status', 'occupation', 'forces',
'relationship', 'race', 'sex', 'native-country'])
return data
##########gather data##########
#opoen infile
data = open_infile()
print(len(data))
##########process data##########
#one-hot encode categorical columns
onehot_encode(data)
print(data.head())
Traceback (most recent call last):
File "C:/Users/Hezekiah/PycharmProjects/Artificial Intelligence 0/Chapter 1 Application Adult.py", line 20, in <module>
onehot_encode(data)
File "C:/Users/Hezekiah/PycharmProjects/Artificial Intelligence 0/Chapter 1 Application Adult.py", line 11, in onehot_encode
'relationship', 'race', 'sex', 'native-country'])
File "C:\Users\Hezekiah\PycharmProjects\Artificial Intelligence 0\venv\lib\site-packages\pandas\core\reshape\reshape.py", line 812, in get_dummies
data_to_encode = data[columns]
File "C:\Users\Hezekiah\PycharmProjects\Artificial Intelligence 0\venv\lib\site-packages\pandas\core\frame.py", line 2934, in __getitem__
raise_missing=True)
File "C:\Users\Hezekiah\PycharmProjects\Artificial Intelligence 0\venv\lib\site-packages\pandas\core\indexing.py", line 1354, in _convert_to_indexer
return self._get_listlike_indexer(obj, axis, **kwargs)[1]
File "C:\Users\Hezekiah\PycharmProjects\Artificial Intelligence 0\venv\lib\site-packages\pandas\core\indexing.py", line 1161, in _get_listlike_indexer
raise_missing=raise_missing)
File "C:\Users\Hezekiah\PycharmProjects\Artificial Intelligence 0\venv\lib\site-packages\pandas\core\indexing.py", line 1246, in _validate_read_indexer
key=key, axis=self.obj._get_axis_name(axis)))
KeyError: "None of [Index(['workclass', 'education', 'marital-status', 'occupation', 'forces',\n 'relationship', 'race', 'sex', 'native-country'],\n dtype='object')] are in the [columns]"
I expect pandas get_dummies() function to convert all categorical attributes into numerical ones, but instead pycharm is returning a keyerror that tells me that none of my columns exist, when clearly they do.
Upvotes: 3
Views: 5089
Reputation: 1327
your main problem is your data while merging adult.names
with adult.data
file
There is no forces columns in website data you mentioned. if you merge data correctly you will not get this error
too.
Even you are using this column for making dummies too.
Upvotes: 1