Makis Kans
Makis Kans

Reputation: 337

Dataframe still has NaN

Why would this code :

def remove_empties(dataframe):
    classes = list(dataframe)
    new_dataframe = pd.DataFrame(columns=["Value", "Label"])
    for c in classes:
        X=[(k,c) for k in dataframe.loc[:,c] if k]
        T = pd.DataFrame(X, columns =["Value", "Label"] )
        new_dataframe = new_dataframe.append(T)
    return new_dataframe

still produce NaN elemenents? Such as (after printing the result):

298110                               SP  WorkState
298111                               RJ  WorkState
298112                               SP  WorkState
298113                               SP  WorkState
298114                         Scotland  WorkState
298115                              NaN  WorkState

In fact after applying :

ans = pd.isnull(NDF).any(1).nonzero()[0]
NDF.loc[ans]

I get multiple results:

        Value      Label
1430923   NaN  FirstName
1430923     -   LastName
1532357   jty   LastName
3822535   NaN        NaN
3830294   NaN        NaN
4300250   NaN        NaN
5201009   NaN        NaN
5396591   NaN        NaN
5485877   NaN        NaN
5561799   NaN        NaN
5619806   NaN        NaN
5680834   NaN        NaN
6620272   NaN        NaN
7539369   NaN        NaN
8390860   NaN        NaN
8688976   NaN        NaN

One of which isn't empty (jty, LastName), and the one I noticed by simply printing isn't present in the ans list of indexes

EDIT: (solved but thought I should post what helped me out anyway, major thanks to all the responses):

k= numpy.nan
if k :
    print("Hi")
else:
    print("NO")

prints Hi

k= None
if k :
    print("Hi")
else:
    print("NO")

prints NO

(not to mention the way I used .loc[ans] instead of .loc[ans,:] )

Upvotes: 1

Views: 134

Answers (1)

tianhua liao
tianhua liao

Reputation: 675

First, I think the indent is wrong but of course, it is not some big deals.

And then you must know some fact that NaN at pandas/numpy is not some simple empty objects.

If you using following code like bool(np.nan), it will output True which it exactly you used to remove empty atX=[(k,c) for k in dataframe.loc[:,c] if k].

If you do want to remove empty or determine nan, please use numpy.isnan or pd.isna. Or you just use pandas.dropna simply.

The second question I think you might misunderstand the meaning of nonzero, after pd.isnull(NDF).any(1) you got a pure pd.Series which has not the index. So you just get some index constructed by nature number.

More simply, you should use NDF.iloc[ans,:] because nonzero return the index of Series instead of index of dataframe.

Upvotes: 1

Related Questions