Samm Flynn
Samm Flynn

Reputation: 314

How to deal with nan values in numpy

I am trying to find clusters using DBSCAN from sickit.Here is the code -

db = DBSCAN(eps=.2, min_samples=5).fit(p)
     cluster_labels = db.labels_
     num_clusters = len(set(cluster_labels))
     clusters = pd.Series([p[cluster_labels == n] for n in range(num_clusters)])
     print(len(clusters))
     C = np.empty(shape=(len(clusters), 2), dtype=np.float16)
     for i in range(len(clusters)):
         C[i] = np.mean(clusters[i], axis=0)
     print(C)

And i get this runtime warning -

 C:\Users\USER\PycharmProjects\REALDEPTH\venv\lib\site-packages\numpy\core\fromnumeric.py:3257: 

RuntimeWarning: Mean of empty slice.
  out=out, **kwargs)

C:\Users\USER\PycharmProjects\REALDEPTH\venv\lib\site-packages\numpy\core\_methods.py:154: 

RuntimeWarning: invalid value encountered in true_divide
  ret, rcount, out=ret, casting='unsafe', subok=False)

4

[[-1.369   1.895 ]
 [ 0.2095  0.763 ]
 [-0.572   1.688 ]
 [    nan     nan]]

Should i just avoid by it using -

import warnings

warnings.simplefilter("ignore")

or is there any way i can fix it properly? like removing that row which contains nan values.

Edit : So far it seems ignoring the NAN values didn't cause problem for what i am trying to do and if it try this -

    print(len(clusters))
    C = np.empty(shape=(len(clusters), 2))
for i in range(len(clusters)):
    if not np.isnan(C[i][0]):
        print(np.isnan(C[i][0]))
        C[i] = np.mean(clusters[i], axis=0, dtype=np.float64)
        print(C[i][0])

print(C)

I get this output -

C:\Users\USER\PycharmProjects\REALDEPTH\venv\lib\site-packages\numpy\core\fromnumeric.py:3257: RuntimeWarning: Mean of empty slice.
  out=out, **kwargs)
C:\Users\USER\PycharmProjects\REALDEPTH\venv\lib\site-packages\numpy\core\_methods.py:154: RuntimeWarning: invalid value encountered in true_divide
  ret, rcount, out=ret, casting='unsafe', subok=False)
4
False
-1.4311423570879045
False
0.14525776544683858
False
-0.7161999985172942
False
nan
[[-1.43114236  1.9280001 ]
 [ 0.14525777  0.79508425]
 [-0.7162      1.73658117]
 [        nan         nan]]

I dont get it, np.isnan(C[i][0]) is returning false but the value is NAN, What am i missing? My dataset p is too big to show here but there is no NAN element and no element is too close to zero.

Upvotes: 1

Views: 3543

Answers (3)

Noah Weber
Noah Weber

Reputation: 332

Reading the warning we can deduce problem is in

np.mean(clusters[i], axis=0)

since you are taking mean values of an array WITH ONLY NAN elements (had you had some numerics you would not get this error) we get this warning. Suggestion: Think do you want to see these NAN in your clusters result

Upvotes: 0

CumminUp07
CumminUp07

Reputation: 1968

It really depends on your data and your specific problem as to how you would handle this. Is this something that is expected? Should you impute the values? If you want to remove the rows with nan you could do something like this:

p[~np.isnan(p).any(axis=1)]

Upvotes: 0

Karl Philipp Sy Fabre
Karl Philipp Sy Fabre

Reputation: 155

Convert your nan values into zero like this one:

data = data.replace(np.nan,0)

Upvotes: 1

Related Questions