Reputation: 151
I have a dictionary that is filled with data from two files I imported, but some of the data comes out as nan. How do I remove the pieces of data with nan?
My code is:
import matplotlib.pyplot as plt
from pandas.lib import Timestamp
import numpy as np
from datetime import datetime
import pandas as pd
import collections
orangebook = pd.read_csv('C:\Users\WEGWEIS_JAKE\Desktop\Work Programs\Code Files\products2.txt',sep='~', parse_dates=['Approval_Date'])
specificdrugs=pd.read_csv('C:\Users\WEGWEIS_JAKE\Desktop\Work Programs\Code Files\Drugs.txt',sep=',')
"""This is a dictionary that collects data from the .txt file
This dictionary has a key,value pair for every generic name with its corresponding approval date """
drugdict={}
for d in specificdrugs['Generic Name']:
drugdict.dropna()
drugdict[d]=orangebook[orangebook.Ingredient==d.upper()]['Approval_Date'].min()
What should I add or take away from this code to make sure that there are no key,value pairs in the dictionary with a value of nan?
Upvotes: 15
Views: 53747
Reputation: 1458
A slightly modified version of twinlakes's approach would be that of using pandas.isna() functionality as follows: if nans are being stored as keys:
# functional
clean_dict = filter(lambda k: not pd.isna(k), my_dict)
# dict comprehension
clean_dict = {k: my_dict[k] for k in my_dict if not pd.isna(k)}
if nans are being stored as values:
# functional
clean_dict = filter(lambda k: not pd.isna(my_dict[k]), my_dict)
# dict comprehension
clean_dict = {k: my_dict[k] for k in my_dict if not pd.isna(my_dict[k])}
This way even when the fields are non numeric, it'll still work.
Upvotes: 3
Reputation: 473
Know old, but here is what worked for me and simple - remove NaNs on reading of the CSV upfront:
orangebook = pd.read_csv('C:\Users\WEGWEIS_JAKE\Desktop\Work Programs\Code Files\products2.txt',sep='~', parse_dates=['Approval_Date']).dropna()
I also like to convert to dictionary at the same time:
orangebook = pd.read_csv('C:\Users\WEGWEIS_JAKE\Desktop\Work Programs\Code Files\products2.txt',sep='~', parse_dates=['Approval_Date']).dropna().to_dict()
Upvotes: 0
Reputation: 5473
With simplejson
import simplejson
clean_dict = simplejson.loads(simplejson.dumps(my_dict, ignore_nan=True))
## or depending on your needs
clean_dict = simplejson.loads(simplejson.dumps(my_dict, allow_nan=False))
Upvotes: 5
Reputation: 10248
from math import isnan
if nans are being stored as keys:
# functional
clean_dict = filter(lambda k: not isnan(k), my_dict)
# dict comprehension
clean_dict = {k: my_dict[k] for k in my_dict if not isnan(k)}
if nans are being stored as values:
# functional
clean_dict = filter(lambda k: not isnan(my_dict[k]), my_dict)
# dict comprehension
clean_dict = {k: my_dict[k] for k in my_dict if not isnan(my_dict[k])}
Upvotes: 31
Reputation: 2424
Instead of trying to remove the NaNs from your dictionary, you should further investigate why NaNs are getting there in the first place.
It gets difficult to use NaNs in a dictionary, as a NaN does not equal itself.
Check this out for more information: NaNs as key in dictionaries
Upvotes: 1