Reputation: 41
Okay, so I've trying to clean data for the Machine Learning project. I'm using Z-Score for the outliers detection. Database contains different types of glass (from 1-7) and I want to go through each glass type, find the outliers and replace them with mean values of the sodium contained in a given type of glass ("Na" column). The weird thing is the algorithm is working for glass Type 1 and 2 but when it comes to Type 3 it gives a ValueError. Do you guys know what seems to be the problem?
z = stats.zscore(DataFrame.Na)
threshold = 1.99
for t in DataFrame.Type.unique():
z = stats.zscore(DataFrame.Na[DataFrame.Type==t])
print([DataFrame.Na[DataFrame.Type==t][(np.abs(z) > threshold)]])
DataFrame.Na[DataFrame.Type==t] = DataFrame.Na[DataFrame.Type==t].replace([DataFrame.Na[DataFrame.Type==t][(np.abs(z) > threshold)]],np.mean(DataFrame.Na[DataFrame.Type==t]))
And the output is:
[17 14.36
21 14.77
Name: Na, dtype: float64]
[70 14.86
105 11.45
106 10.73
108 14.43
110 11.23
111 11.02
Name: Na, dtype: float64]
[149 12.16
Name: Na, dtype: float64]
/opt/conda/lib/python3.7/site-packages/ipykernel_launcher.py:9: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
if __name__ == '__main__':
/opt/conda/lib/python3.7/site-packages/ipykernel_launcher.py:9: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
if __name__ == '__main__':
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
/opt/conda/lib/python3.7/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
2897 try:
-> 2898 return self._engine.get_loc(casted_key)
2899 except KeyError as err:
KeyError: 0
Any of you guys know what could be wrong with this? If you need any additional information I will provide it, thinking about this for about 2 hours and I don't have a clue...
Upvotes: -1
Views: 102
Reputation: 190
What is happening is that somewhere you are trying to set the value at row 0 in a dataframe that does not have a row 0. Try breaking up your long lines, and printing the results to console, you'll likely find the error that way.
Upvotes: 0
Reputation: 190
I can't comment so I'll post my comment as an answer.
Are you trying to detect "outliers" or "outliners". Not just being pedantic here as they are different statistical concepts.
Upvotes: 0