aestheticnoodle
aestheticnoodle

Reputation: 161

KeyError: "['Date'] not found in axis"?

I've been working on an algorithm in Python that parses through data in excel with Pandas and attempts to delete any data with missing values, basically any row with NaN in one of it's columns, any capitalization.

The following is my code:

import numpy as np
import pandas as pd 
import math as math
import shutil as shutil

from random import seed
from random import random


randNum = int(random() * 100) 

shutil.copy('unsorted/daily/fed_debt_data.csv', 'unsorted/daily/fed_debt_data' + str(randNum) + '.csv')

debt_copy = 'unsorted/daily/fed_debt_data' + str(randNum) + '.csv'

debt_copy_read = pd.read_csv(debt_copy, names = ["Date", "Debt"])
debt_copy_read.head()

for key, value in debt_copy_read.iteritems():
    debt_copy_read.drop(key, axis = 0)

The expected result is that I delete any row with a column that contains a NaN value. The actual result is that I continuously get an error when I am running the code:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-20-3083af5a3e02> in <module>
      1 for key, value in debt_copy_read.iteritems():
----> 2     debt_copy_read.drop(key, axis = 0)

~\Anaconda3\lib\site-packages\pandas\core\frame.py in drop(self, labels, axis, index, columns, level, inplace, errors)
   3938                                            index=index, columns=columns,
   3939                                            level=level, inplace=inplace,
-> 3940                                            errors=errors)
   3941 
   3942     @rewrite_axis_style_signature('mapper', [('copy', True),

~\Anaconda3\lib\site-packages\pandas\core\generic.py in drop(self, labels, axis, index, columns, level, inplace, errors)
   3778         for axis, labels in axes.items():
   3779             if labels is not None:
-> 3780                 obj = obj._drop_axis(labels, axis, level=level, errors=errors)
   3781 
   3782         if inplace:

~\Anaconda3\lib\site-packages\pandas\core\generic.py in _drop_axis(self, labels, axis, level, errors)
   3810                 new_axis = axis.drop(labels, level=level, errors=errors)
   3811             else:
-> 3812                 new_axis = axis.drop(labels, errors=errors)
   3813             result = self.reindex(**{axis_name: new_axis})
   3814 

~\Anaconda3\lib\site-packages\pandas\core\indexes\base.py in drop(self, labels, errors)
   4963             if errors != 'ignore':
   4964                 raise KeyError(
-> 4965                     '{} not found in axis'.format(labels[mask]))
   4966             indexer = indexer[~mask]
   4967         return self.delete(indexer)

KeyError: "['Date'] not found in axis"

I am trying to loop over data concerning US Debt, with the 'Date' variable in one Column and the "Debt" in the other. Any suggestions as to what went wrong/fixes are appreciated. The data is organized as follows:

Date,Debt
2010-02-01T14:30:00Z,12349463585067.40
2010-02-03T14:30:00Z,12354041054846.90
2010-02-05T14:30:00Z,12345510656150.00
2010-02-09T14:30:00Z,12349467132738.40
2010-02-11T14:30:00Z,12349324464284.20
2010-02-16T14:30:00Z,12384358013736.30
2010-02-17T14:30:00Z,12386495535882.20
2010-02-18T14:30:00Z,12401448666808.30

Upvotes: 0

Views: 6079

Answers (2)

Lakshmi - Intel
Lakshmi - Intel

Reputation: 601

You don't need to iterate over the rows to delete the rows with NAN values. You can directly call the dropna() method of pandas.DataFrame.Please refer the following url for more details: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.dropna.html

import numpy as np
import pandas as pd 
import math as math
import shutil as shutil

from random import seed
from random import random


randNum = int(random() * 100) 

shutil.copy('unsorted/daily/fed_debt_data.csv', 'unsorted/daily/fed_debt_data' + str(randNum) + '.csv')

debt_copy = 'unsorted/daily/fed_debt_data' + str(randNum) + '.csv'

debt_copy_read = pd.read_csv(debt_copy, names = ["Date", "Debt"])
debt_copy_read.head()

debt_copy_read.dropna()

Upvotes: 1

oppressionslayer
oppressionslayer

Reputation: 7204

You can try:

debt_copy.dropna()

to drop rows with nan's in them

If pandas reformats your debt column, you can reformat it with:

 pd.set_option('display.float_format', lambda x: '%.2f' % x) 

Upvotes: 1

Related Questions