Nero515
Nero515

Reputation: 41

Problem while trying to replace outliers in pandas

Okay, so I've trying to clean data for the Machine Learning project. I'm using Z-Score for the outliers detection. Database contains different types of glass (from 1-7) and I want to go through each glass type, find the outliers and replace them with mean values of the sodium contained in a given type of glass ("Na" column). The weird thing is the algorithm is working for glass Type 1 and 2 but when it comes to Type 3 it gives a ValueError. Do you guys know what seems to be the problem?

z = stats.zscore(DataFrame.Na)
threshold = 1.99

for t in DataFrame.Type.unique():
    z = stats.zscore(DataFrame.Na[DataFrame.Type==t])
    print([DataFrame.Na[DataFrame.Type==t][(np.abs(z) > threshold)]])
    DataFrame.Na[DataFrame.Type==t] = DataFrame.Na[DataFrame.Type==t].replace([DataFrame.Na[DataFrame.Type==t][(np.abs(z) > threshold)]],np.mean(DataFrame.Na[DataFrame.Type==t]))

And the output is:

[17    14.36
21    14.77
Name: Na, dtype: float64]
[70     14.86
105    11.45
106    10.73
108    14.43
110    11.23
111    11.02
Name: Na, dtype: float64]
[149    12.16
Name: Na, dtype: float64]

/opt/conda/lib/python3.7/site-packages/ipykernel_launcher.py:9: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  if __name__ == '__main__':
/opt/conda/lib/python3.7/site-packages/ipykernel_launcher.py:9: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  if __name__ == '__main__':
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
/opt/conda/lib/python3.7/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   2897             try:
-> 2898                 return self._engine.get_loc(casted_key)
   2899             except KeyError as err:

KeyError: 0

Any of you guys know what could be wrong with this? If you need any additional information I will provide it, thinking about this for about 2 hours and I don't have a clue...

Upvotes: -1

Views: 102

Answers (2)

xanatos
xanatos

Reputation: 190

What is happening is that somewhere you are trying to set the value at row 0 in a dataframe that does not have a row 0. Try breaking up your long lines, and printing the results to console, you'll likely find the error that way.

Upvotes: 0

xanatos
xanatos

Reputation: 190

I can't comment so I'll post my comment as an answer.

Are you trying to detect "outliers" or "outliners". Not just being pedantic here as they are different statistical concepts.

Upvotes: 0

Related Questions