Reputation: 337
Why would this code :
def remove_empties(dataframe):
classes = list(dataframe)
new_dataframe = pd.DataFrame(columns=["Value", "Label"])
for c in classes:
X=[(k,c) for k in dataframe.loc[:,c] if k]
T = pd.DataFrame(X, columns =["Value", "Label"] )
new_dataframe = new_dataframe.append(T)
return new_dataframe
still produce NaN elemenents? Such as (after printing the result):
298110 SP WorkState
298111 RJ WorkState
298112 SP WorkState
298113 SP WorkState
298114 Scotland WorkState
298115 NaN WorkState
In fact after applying :
ans = pd.isnull(NDF).any(1).nonzero()[0]
NDF.loc[ans]
I get multiple results:
Value Label
1430923 NaN FirstName
1430923 - LastName
1532357 jty LastName
3822535 NaN NaN
3830294 NaN NaN
4300250 NaN NaN
5201009 NaN NaN
5396591 NaN NaN
5485877 NaN NaN
5561799 NaN NaN
5619806 NaN NaN
5680834 NaN NaN
6620272 NaN NaN
7539369 NaN NaN
8390860 NaN NaN
8688976 NaN NaN
One of which isn't empty (jty, LastName), and the one I noticed by simply printing isn't present in the ans list of indexes
EDIT: (solved but thought I should post what helped me out anyway, major thanks to all the responses):
k= numpy.nan
if k :
print("Hi")
else:
print("NO")
prints Hi
k= None
if k :
print("Hi")
else:
print("NO")
prints NO
(not to mention the way I used .loc[ans] instead of .loc[ans,:] )
Upvotes: 1
Views: 134
Reputation: 675
First, I think the indent is wrong but of course, it is not some big deals.
And then you must know some fact that NaN
at pandas/numpy
is not some simple empty objects.
If you using following code like bool(np.nan)
, it will output True
which it exactly you used to remove empty atX=[(k,c) for k in dataframe.loc[:,c] if k]
.
If you do want to remove empty or determine nan
, please use numpy.isnan
or pd.isna
.
Or you just use pandas.dropna
simply.
The second question I think you might misunderstand the meaning of nonzero
, after pd.isnull(NDF).any(1)
you got a pure pd.Series
which has not the index. So you just get some index constructed by nature number.
More simply, you should use NDF.iloc[ans,:]
because nonzero
return the index of Series
instead of index of dataframe
.
Upvotes: 1