Reputation: 39
For an assignment I have to erase the outliers of a csv based on the different method
I tried working with the variable 'height' of the csv after opening the csv into a panda dataframe, but it keeps giving me errors or not touching the outliers at all, all this trying to use KNN method in python
The code that I wrote is the following
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import chi2
import pandas as pd
from sklearn.neighbors import NearestNeighbors
from sklearn.datasets import make_blobs
df = pd.read_csv("data.csv")
print(df.describe())
print(df.columns)
df['height'].plot(kind='hist')
print(df['height'].value_counts())
data= pd.DataFrame(df['height'],df['active'])
k=1
knn = NearestNeighbors(n_neighbors=k)
knn.fit([df['height']])
neighbors_and_distances = knn.kneighbors([df['height']])
knn_distances = neighbors_and_distances[0]
tnn_distance = np.mean(knn_distances, axis=1)
print(knn_distances)
PCM = df.plot(kind='scatter', x='x', y='y', c=tnn_distance, colormap='viridis')
plt.show()
And the data it something like this:
id,age,gender,height,weight,ap_hi,ap_lo,cholesterol,gluc,smoke,alco,active,cardio
0,18393,2,168,62.0,110,80,1,1,0,0,1,0
1,20228,1,156,85.0,140,90,3,1,0,0,1,1
2,18857,1,50,64.0,130,70,3,1,0,0,0,1
3,17623,2,250,82.0,150,100,1,1,0,0,1,1
I dont know what Im missing or doing wrong
Upvotes: 0
Views: 93
Reputation: 761
df = pd.read_csv("data.csv")
X = df[['height', 'weight']]
X.plot(kind='scatter', x='weight', y='height', colormap='viridis')
plt.show()
knn = NearestNeighbors(n_neighbors=2).fit(X)
distances, indices = knn.kneighbors(X)
X['distances'] = distances[:,1]
X.distances
0 1.000000
1 1.000000
2 1.000000
3 3.000000
4 1.000000
5 1.000000
6 133.958949
7 100.344407
...
X.plot(kind='scatter', x='weight', y='height', c='distances', colormap='viridis')
plt.show()
MAX_DIST = 10
X[distances < MAX_DIST]
height weight
0 162 78.0
1 162 78.0
2 151 76.0
3 151 76.0
4 171 84.0
...
And finally to filter out all the outliers:
MAX_DIST = 10
X = X[X.distances < MAX_DIST]
Upvotes: 1