Arnold
Arnold

Reputation: 21

KeyError, pandas doing weird things

I have been searching for days about this error, and no improvement whatsoever. It seems like pandas iterates the dataframe, and then again. However seems that when the keys are iterated at a specific iteration, KeyError is raised. Could it all be problem of the interpreter, or there's a mistake in my code? Any help will be kindly appreciated.

More context:

Here you have the code:

def extract_features(id_arr):
features_df = pd.read_csv(r'D:\fma_metadata\features.csv', index_col=0, na_values=['NA'], encoding='utf-8')
features = np.array(features_df.columns)
id_arr = np.asarray(id_arr, dtype=int)

for id in id_arr:
    row_features = []

    for key, value in features_df.iteritems():
        number = float(features_df[key][id])
        row_features.append(round(number, 6))

    row_features = np.asarray(row_features)
    features = np.vstack((features, row_features))

features = np.delete(features, 0, 0)

return features


random_id = get_random_id()

extract_features(random_id)

Error:

  Traceback (most recent call last):
  File "C:/Users/*****/PycharmProjects/****/emotions-nn/deep-learning/input.py", line 65, in <module>
    print(extract_features(random_id))
  File "C:/Users/*****/PycharmProjects/****/emotions-nn/deep-learning/input.py", line 51, in extract_features
    number = float(features_df[key][id])
  File "C:\Users\*****\anaconda3\envs\tensorflow\lib\site-packages\pandas\core\series.py", line 882, in __getitem__
    return self._get_value(key)
  File "C:\Users\*****\anaconda3\envs\tensorflow\lib\site-packages\pandas\core\series.py", line 991, in _get_value
    loc = self.index.get_loc(label)
  File "C:\Users\*****\anaconda3\envs\tensorflow\lib\site-packages\pandas\core\indexes\base.py", line 2891, in get_loc
    raise KeyError(key) from err
KeyError: 800

Upvotes: 1

Views: 1121

Answers (1)

dbokers
dbokers

Reputation: 920

I'm guessing it could be your multi-level index.

# ids can be a list of integers too
def extract(ids: np.ndarray):
    # assuming the first 3 rows are "headers"
    df = pd.read_csv(r"C:\Users\danie\Downloads\features - subset.csv", header=[0,1,2], index_col=0, na_values=['NA'])

    # you can set a breakpoint here to see the current column order
    # print(df.columns)
    # and reorganize the way you want it

    # this is basically what you're trying to do if I'm not mistaken
    return df.loc[ids].round(6).to_numpy()

    # if there's a column order
    return df.loc[ids, order].round(6).to_numpy()

Upvotes: 1

Related Questions