Reputation: 21
I have been searching for days about this error, and no improvement whatsoever. It seems like pandas iterates the dataframe, and then again. However seems that when the keys are iterated at a specific iteration, KeyError is raised. Could it all be problem of the interpreter, or there's a mistake in my code? Any help will be kindly appreciated.
More context:
Subset of features_df: https://www.transfernow.net/3yS7pE092020
Parameter fed to function: np.array of IDs (dtype=int
) that will be searched through the dataset
Here you have the code:
def extract_features(id_arr):
features_df = pd.read_csv(r'D:\fma_metadata\features.csv', index_col=0, na_values=['NA'], encoding='utf-8')
features = np.array(features_df.columns)
id_arr = np.asarray(id_arr, dtype=int)
for id in id_arr:
row_features = []
for key, value in features_df.iteritems():
number = float(features_df[key][id])
row_features.append(round(number, 6))
row_features = np.asarray(row_features)
features = np.vstack((features, row_features))
features = np.delete(features, 0, 0)
return features
random_id = get_random_id()
extract_features(random_id)
Error:
Traceback (most recent call last):
File "C:/Users/*****/PycharmProjects/****/emotions-nn/deep-learning/input.py", line 65, in <module>
print(extract_features(random_id))
File "C:/Users/*****/PycharmProjects/****/emotions-nn/deep-learning/input.py", line 51, in extract_features
number = float(features_df[key][id])
File "C:\Users\*****\anaconda3\envs\tensorflow\lib\site-packages\pandas\core\series.py", line 882, in __getitem__
return self._get_value(key)
File "C:\Users\*****\anaconda3\envs\tensorflow\lib\site-packages\pandas\core\series.py", line 991, in _get_value
loc = self.index.get_loc(label)
File "C:\Users\*****\anaconda3\envs\tensorflow\lib\site-packages\pandas\core\indexes\base.py", line 2891, in get_loc
raise KeyError(key) from err
KeyError: 800
Upvotes: 1
Views: 1121
Reputation: 920
I'm guessing it could be your multi-level index.
# ids can be a list of integers too
def extract(ids: np.ndarray):
# assuming the first 3 rows are "headers"
df = pd.read_csv(r"C:\Users\danie\Downloads\features - subset.csv", header=[0,1,2], index_col=0, na_values=['NA'])
# you can set a breakpoint here to see the current column order
# print(df.columns)
# and reorganize the way you want it
# this is basically what you're trying to do if I'm not mistaken
return df.loc[ids].round(6).to_numpy()
# if there's a column order
return df.loc[ids, order].round(6).to_numpy()
Upvotes: 1