Reputation: 2185
I have a pandas data frame and created a dictionary based on columns of the data frame. The dictionary is almost well generated but the only problem is that I try to filter out the NaN value but my code doesn't work, so there are NaN as key in the dictionary. My code is the following:
for key,row in mr.iterrows():
# With this line I try to filter out the NaN values but it doesn't work
if pd.notnull(row['Company nameC']) and pd.notnull(row['Company nameA']) and pd.notnull(row['NEW ID']) :
newppmr[row['NEW ID']]=row['Company nameC']
The output is:
defaultdict(<type 'list'>, {nan: '1347 PROPERTY INS HLDGS INC', 1.0: 'AFLAC INC', 2.0: 'AGCO CORP', 3.0: 'AGL RESOURCES INC', 4.0: 'INVESCO LTD', 5.0: 'AK STEEL HOLDING CORP', 6.0: 'AMN HEALTHCARE SERVICES INC', nan: 'FOREVERGREEN WORLDWIDE CORP'
So, I don't know how to filer out the nan values and what's wrong with my code.
EDIT:
An example of my pandas data frames is:
CUSIP Company nameA A�O NEW ID Company nameC
42020 98912M201 NaN NaN NaN ZAP
42021 989063102 NaN NaN NaN ZAP.COM CORP
42022 98919T100 NaN NaN NaN ZAZA ENERGY CORP
42023 98876R303 NaN NaN NaN ZBB ENERGY CORP
Upvotes: 0
Views: 2374
Reputation: 583
Pasting an example - how to remove "nan" keys from your dictionary:
Lets create dict with 'nan' keys (NaN in numeric arrays)
>>> a = float("nan")
>>> b = float("nan")
>>> d = {a: 1, b: 2, 'c': 3}
>>> d
{nan: 1, nan: 2, 'c': 3}
Now, lets remove all 'nan' keys
>>> from math import isnan
>>> c = dict((k, v) for k, v in d.items() if not (type(k) == float and isnan(k)))
>>> c
{'c': 1}
Other scenario that works fine. Maybe I'm missing something ?
In [1]: import pandas as pd
In [2]: import numpy as np
In [3]: df = pd.DataFrame({'a':[1,2,3,4,np.nan],'b':[np.nan,np.nan,np.nan,5,np.nan]})
In [4]: df
Out[4]:
a b
0 1 NaN
1 2 NaN
2 3 NaN
3 4 5
4 NaN NaN
In [5]: for key, row in df.iterrows(): print pd.notnull(row['a'])
True
True
True
True
False
In [6]: for key, row in df.iterrows(): print pd.notnull(row['b'])
False
False
False
True
False
In [7]: x = {}
In [8]: for key, row in df.iterrows():
....: if pd.notnull(row['b']) and pd.notnull(row['a']):
....: x[row['b']]=row['a']
....:
In [9]: x
Out[9]: {5.0: 4.0}
Upvotes: 1