Reputation: 279
I have a csv file with data that is formatted for example, as follows(my data set is much much larger):
Image Id,URL,Latitude,Longitude,Address
10758202333,https://farm8.staticflickr.com/7408/10758202333_b6c29d93b1_q.jpg,51.482826,-0.167112,Cadogan Pier Chelsea Embankment Chelsea Royal Borough of Kensington and Chelsea London
23204019400,https://farm6.staticflickr.com/5688/23204019400_fb6879abe3_q.jpg,51.483106,-3.171207,Greggs Station Terrace Plasnewydd Cardiff Wales CF United Kingdom
11243511074,https://farm3.staticflickr.com/2818/11243511074_e1e2f1b99c_q.jpg,51.483297,-0.166534,Albert Bridge Chelsea Embankment Chelsea Royal Borough of Kensington and Chelsea London Greater London England SW3 5SY United Kingdom
22186903335,https://farm6.staticflickr.com/5697/22186903335_de53168305_q.jpg,51.483394,-3.176926,Greyfriars House Greyfriars Road Plasnewydd Cardiff Wales CF United Kingdom
22197179851,https://farm6.staticflickr.com/5786/22197179851_a818b17fae_q.jpg,51.483394,-3.176926,Greyfriars House Greyfriars Road Plasnewydd Cardiff Wales CF United Kingdom
22174235522,https://farm1.staticflickr.com/589/22174235522_3ffd1de2bb_q.jpg,51.483394,-3.176926,Greyfriars House Greyfriars Road Plasnewydd Cardiff Wales CF United Kingdom
22160755536,https://farm1.staticflickr.com/761/22160755536_8e23e9ed32_q.jpg,51.483394,-3.176926,Greyfriars House Greyfriars Road Plasnewydd Cardiff Wales CF United Kingdom
7667114130,https://farm8.staticflickr.com/7269/7667114130_117849250a_q.jpg,51.484563,-3.178181,Oybike Gorsedd Gardens Road Cathays Cardiff Wales CF United Kingdom
17136775881,https://farm9.staticflickr.com/8780/17136775881_363c2379ef_q.jpg,51.484608,-3.178845,Oybike Gorsedd Gardens Road Cathays Cardiff Wales CF United Kingdom
7110881411,https://farm9.staticflickr.com/8162/7110881411_f0fe3d7214_q.jpg,51.484644,-3.178099,Oybike Gorsedd Gardens Road Cathays Cardiff Wales CF United Kingdom
11718453936,https://farm4.staticflickr.com/3700/11718453936_148af12df6_q.jpg,51.484661,-3.179117,King Edward VII Avenue Cathays Cardiff Wales CF United Kingdom
20218915752,https://farm1.staticflickr.com/352/20218915752_4282c1f9b8_q.jpg,51.484683,-3.179147,King Edward VII Avenue Cathays Cardiff Wales CF United Kingdom
My code is as follows, I know it is not much but I simply want to be able to see a cluster plot figure showing up for now with centroids. However I am getting an error "ValueError: array must not contain infs or NaNs"
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from scipy.cluster.vq import kmeans, kmeans2, whiten
df = pd.read_csv('dataset_import.csv')
df.head()
coordinates = df.as_matrix(columns=['latitude', 'longitude'])
N = len(coordinates)
k = 100
i = 50
w = whiten(coordinates)
cluster_centroids, closest_centroids = kmeans2(w, k, iter=i, minit='points')
plt.figure(figsize=(10, 6), dpi=100)
plt.scatter(cluster_centroids[:,0], cluster_centroids[:,1], c='r', alpha=.7, s=150)
plt.scatter(w[:,0], w[:,1], c='k', alpha=.3, s=10)
plt.show()
Can anyone shed some light as to why this is happening, perhaps some of the fugures in my code are wrong etc. Thanks!
Upvotes: 1
Views: 12600
Reputation: 689
I have met the same problem with you, and I solved by wipe out the NaNs and infs.
def clean(serie):
output = serie[(np.isnan(serie) == False) & (np.isinf(serie) == False)]
return output
When I draw a plot, I use this function to clean my data in a temporary way, and it works now.
fig = plt.figure()
clean(data[col]).plot(kind='kde')
plt.show()
Or like this:
sns.kdeplot(clean(data[col]), bw=0.1, shade=True, legend=False)
Upvotes: 2