Kavya
Kavya

Reputation: 23

How to fix 'Key Error: Index' error in jupyter notebook

I am building a neural network model. I am using Jupyter Notebook and I have imported the necessary libraries. There are two datasets and it is merged into one. After merging when I run this code, KeyError: Index([ ]) error message displays. Can you please help me to solve the issue.

The code:

merge_vector = ["school","sex","age","address",
                "famsize","Pstatus","Medu","Fedu",
                "Mjob","Fjob","reason","nursery","internet"]

duplicated_mask = merged_df.duplicated(keep=False, subset=merge_vector)

Error message:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-40-4f1a3ab8858b> in <module>()
----> 1 duplicated_mask = merged_df.duplicated(keep=False, subset=merge_vector)

E:\Anaconda2\envs\tensorflow\lib\site-packages\pandas\core\frame.py in duplicated(self, subset, keep)
   4379         diff = Index(subset).difference(self.columns)
   4380         if not diff.empty:
-> 4381             raise KeyError(diff)
   4382 
   4383         vals = (col.values for name, col in self.iteritems()

KeyError: Index(['Fedu', 'Fjob', 'Medu', 'Mjob', 'Pstatus', 'address', 'age', 'famsize',
       'internet', 'nursery', 'reason', 'school', 'sex'],
      dtype='object')

Imported libraries for the NN model

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
from math import floor, ceil
from pylab import rcParams

%matplotlib inline

Upvotes: 2

Views: 5128

Answers (1)

jezrael
jezrael

Reputation: 863731

You need intersection columns names with merge_vector, because in DataFrame some columns not exist:

merge_vector = ["school","sex","age","address",
                "famsize","Pstatus","Medu","Fedu",
                "Mjob","Fjob","reason","nursery","internet"]

merged_df = pd.DataFrame({'internet':[4,5,5],
                          'school':[7,8,8],
                          'new':[1,2,3]})
print (merged_df)
   internet  school  new
0         4       7    1
1         5       8    2
2         5       8    3

existed_cols = merged_df.columns.intersection(merge_vector)
print (existed_cols)
Index(['internet', 'school'], dtype='object')

duplicated_mask = merged_df.duplicated(keep=False, subset=existed_cols)
print (duplicated_mask)
0    False
1     True
2     True
dtype: bool

Upvotes: 3

Related Questions