Reputation: 314
I am trying to find clusters using DBSCAN from sickit.Here is the code -
db = DBSCAN(eps=.2, min_samples=5).fit(p)
cluster_labels = db.labels_
num_clusters = len(set(cluster_labels))
clusters = pd.Series([p[cluster_labels == n] for n in range(num_clusters)])
print(len(clusters))
C = np.empty(shape=(len(clusters), 2), dtype=np.float16)
for i in range(len(clusters)):
C[i] = np.mean(clusters[i], axis=0)
print(C)
And i get this runtime warning -
C:\Users\USER\PycharmProjects\REALDEPTH\venv\lib\site-packages\numpy\core\fromnumeric.py:3257:
RuntimeWarning: Mean of empty slice.
out=out, **kwargs)
C:\Users\USER\PycharmProjects\REALDEPTH\venv\lib\site-packages\numpy\core\_methods.py:154:
RuntimeWarning: invalid value encountered in true_divide
ret, rcount, out=ret, casting='unsafe', subok=False)
4
[[-1.369 1.895 ]
[ 0.2095 0.763 ]
[-0.572 1.688 ]
[ nan nan]]
Should i just avoid by it using -
import warnings
warnings.simplefilter("ignore")
or is there any way i can fix it properly? like removing that row which contains nan values.
Edit : So far it seems ignoring the NAN values didn't cause problem for what i am trying to do and if it try this -
print(len(clusters))
C = np.empty(shape=(len(clusters), 2))
for i in range(len(clusters)):
if not np.isnan(C[i][0]):
print(np.isnan(C[i][0]))
C[i] = np.mean(clusters[i], axis=0, dtype=np.float64)
print(C[i][0])
print(C)
I get this output -
C:\Users\USER\PycharmProjects\REALDEPTH\venv\lib\site-packages\numpy\core\fromnumeric.py:3257: RuntimeWarning: Mean of empty slice.
out=out, **kwargs)
C:\Users\USER\PycharmProjects\REALDEPTH\venv\lib\site-packages\numpy\core\_methods.py:154: RuntimeWarning: invalid value encountered in true_divide
ret, rcount, out=ret, casting='unsafe', subok=False)
4
False
-1.4311423570879045
False
0.14525776544683858
False
-0.7161999985172942
False
nan
[[-1.43114236 1.9280001 ]
[ 0.14525777 0.79508425]
[-0.7162 1.73658117]
[ nan nan]]
I dont get it, np.isnan(C[i][0]) is returning false but the value is NAN, What am i missing? My dataset p is too big to show here but there is no NAN element and no element is too close to zero.
Upvotes: 1
Views: 3543
Reputation: 332
Reading the warning we can deduce problem is in
np.mean(clusters[i], axis=0)
since you are taking mean values of an array WITH ONLY NAN elements (had you had some numerics you would not get this error) we get this warning. Suggestion: Think do you want to see these NAN in your clusters result
Upvotes: 0
Reputation: 1968
It really depends on your data and your specific problem as to how you would handle this. Is this something that is expected? Should you impute the values? If you want to remove the rows with nan
you could do something like this:
p[~np.isnan(p).any(axis=1)]
Upvotes: 0
Reputation: 155
Convert your nan values into zero like this one:
data = data.replace(np.nan,0)
Upvotes: 1