Reputation: 161
I've been working on an algorithm in Python that parses through data in excel with Pandas and attempts to delete any data with missing values, basically any row with NaN in one of it's columns, any capitalization.
The following is my code:
import numpy as np
import pandas as pd
import math as math
import shutil as shutil
from random import seed
from random import random
randNum = int(random() * 100)
shutil.copy('unsorted/daily/fed_debt_data.csv', 'unsorted/daily/fed_debt_data' + str(randNum) + '.csv')
debt_copy = 'unsorted/daily/fed_debt_data' + str(randNum) + '.csv'
debt_copy_read = pd.read_csv(debt_copy, names = ["Date", "Debt"])
debt_copy_read.head()
for key, value in debt_copy_read.iteritems():
debt_copy_read.drop(key, axis = 0)
The expected result is that I delete any row with a column that contains a NaN value. The actual result is that I continuously get an error when I am running the code:
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-20-3083af5a3e02> in <module>
1 for key, value in debt_copy_read.iteritems():
----> 2 debt_copy_read.drop(key, axis = 0)
~\Anaconda3\lib\site-packages\pandas\core\frame.py in drop(self, labels, axis, index, columns, level, inplace, errors)
3938 index=index, columns=columns,
3939 level=level, inplace=inplace,
-> 3940 errors=errors)
3941
3942 @rewrite_axis_style_signature('mapper', [('copy', True),
~\Anaconda3\lib\site-packages\pandas\core\generic.py in drop(self, labels, axis, index, columns, level, inplace, errors)
3778 for axis, labels in axes.items():
3779 if labels is not None:
-> 3780 obj = obj._drop_axis(labels, axis, level=level, errors=errors)
3781
3782 if inplace:
~\Anaconda3\lib\site-packages\pandas\core\generic.py in _drop_axis(self, labels, axis, level, errors)
3810 new_axis = axis.drop(labels, level=level, errors=errors)
3811 else:
-> 3812 new_axis = axis.drop(labels, errors=errors)
3813 result = self.reindex(**{axis_name: new_axis})
3814
~\Anaconda3\lib\site-packages\pandas\core\indexes\base.py in drop(self, labels, errors)
4963 if errors != 'ignore':
4964 raise KeyError(
-> 4965 '{} not found in axis'.format(labels[mask]))
4966 indexer = indexer[~mask]
4967 return self.delete(indexer)
KeyError: "['Date'] not found in axis"
I am trying to loop over data concerning US Debt, with the 'Date' variable in one Column and the "Debt" in the other. Any suggestions as to what went wrong/fixes are appreciated. The data is organized as follows:
Date,Debt
2010-02-01T14:30:00Z,12349463585067.40
2010-02-03T14:30:00Z,12354041054846.90
2010-02-05T14:30:00Z,12345510656150.00
2010-02-09T14:30:00Z,12349467132738.40
2010-02-11T14:30:00Z,12349324464284.20
2010-02-16T14:30:00Z,12384358013736.30
2010-02-17T14:30:00Z,12386495535882.20
2010-02-18T14:30:00Z,12401448666808.30
Upvotes: 0
Views: 6079
Reputation: 601
You don't need to iterate over the rows to delete the rows with NAN values. You can directly call the dropna() method of pandas.DataFrame.Please refer the following url for more details: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.dropna.html
import numpy as np
import pandas as pd
import math as math
import shutil as shutil
from random import seed
from random import random
randNum = int(random() * 100)
shutil.copy('unsorted/daily/fed_debt_data.csv', 'unsorted/daily/fed_debt_data' + str(randNum) + '.csv')
debt_copy = 'unsorted/daily/fed_debt_data' + str(randNum) + '.csv'
debt_copy_read = pd.read_csv(debt_copy, names = ["Date", "Debt"])
debt_copy_read.head()
debt_copy_read.dropna()
Upvotes: 1
Reputation: 7204
You can try:
debt_copy.dropna()
to drop rows with nan's in them
If pandas reformats your debt column, you can reformat it with:
pd.set_option('display.float_format', lambda x: '%.2f' % x)
Upvotes: 1