Jwegs32
Jwegs32

Reputation: 151

Is there a way to remove nan from a dictionary filled with data?

I have a dictionary that is filled with data from two files I imported, but some of the data comes out as nan. How do I remove the pieces of data with nan?

My code is:

import matplotlib.pyplot as plt 
from pandas.lib import Timestamp
import numpy as np   
from datetime import datetime
import pandas as pd
import collections

orangebook = pd.read_csv('C:\Users\WEGWEIS_JAKE\Desktop\Work Programs\Code Files\products2.txt',sep='~', parse_dates=['Approval_Date'])
specificdrugs=pd.read_csv('C:\Users\WEGWEIS_JAKE\Desktop\Work Programs\Code Files\Drugs.txt',sep=',')

"""This is a dictionary that collects data from the .txt file
This dictionary has a key,value pair for every generic name with its corresponding approval date """
drugdict={}
for d in specificdrugs['Generic Name']:
    drugdict.dropna()
    drugdict[d]=orangebook[orangebook.Ingredient==d.upper()]['Approval_Date'].min()

What should I add or take away from this code to make sure that there are no key,value pairs in the dictionary with a value of nan?

Upvotes: 15

Views: 53747

Answers (5)

Raghul Raj
Raghul Raj

Reputation: 1458

A slightly modified version of twinlakes's approach would be that of using pandas.isna() functionality as follows: if nans are being stored as keys:

# functional
clean_dict = filter(lambda k: not pd.isna(k), my_dict)

# dict comprehension
clean_dict = {k: my_dict[k] for k in my_dict if not pd.isna(k)}

if nans are being stored as values:

# functional
clean_dict = filter(lambda k: not pd.isna(my_dict[k]), my_dict)

# dict comprehension
clean_dict = {k: my_dict[k] for k in my_dict if not pd.isna(my_dict[k])}

This way even when the fields are non numeric, it'll still work.

Upvotes: 3

Colin Miles
Colin Miles

Reputation: 473

Know old, but here is what worked for me and simple - remove NaNs on reading of the CSV upfront:

orangebook = pd.read_csv('C:\Users\WEGWEIS_JAKE\Desktop\Work Programs\Code Files\products2.txt',sep='~', parse_dates=['Approval_Date']).dropna()

I also like to convert to dictionary at the same time:

orangebook = pd.read_csv('C:\Users\WEGWEIS_JAKE\Desktop\Work Programs\Code Files\products2.txt',sep='~', parse_dates=['Approval_Date']).dropna().to_dict()

Upvotes: 0

hangc
hangc

Reputation: 5473

With simplejson

import simplejson

clean_dict  = simplejson.loads(simplejson.dumps(my_dict, ignore_nan=True))
## or depending on your needs
clean_dict  = simplejson.loads(simplejson.dumps(my_dict, allow_nan=False))

Upvotes: 5

twinlakes
twinlakes

Reputation: 10248

from math import isnan

if nans are being stored as keys:

# functional
clean_dict = filter(lambda k: not isnan(k), my_dict)

# dict comprehension
clean_dict = {k: my_dict[k] for k in my_dict if not isnan(k)}

if nans are being stored as values:

# functional
clean_dict = filter(lambda k: not isnan(my_dict[k]), my_dict)

# dict comprehension
clean_dict = {k: my_dict[k] for k in my_dict if not isnan(my_dict[k])}

Upvotes: 31

Greg Hilston
Greg Hilston

Reputation: 2424

Instead of trying to remove the NaNs from your dictionary, you should further investigate why NaNs are getting there in the first place.

It gets difficult to use NaNs in a dictionary, as a NaN does not equal itself.

Check this out for more information: NaNs as key in dictionaries

Upvotes: 1

Related Questions