Reputation: 3
I have a mixed dataframe with both str, int and float types. I have some outliers in the floats columns and tried to replace them to NaN using
df.mask(df.sub(df.mean()).div(df.std()).abs().gt(2))
I've also tried with numpy's
v = df.values
mask = np.abs((v - v.mean(0)) / v.std(0)) > 2
pd.DataFrame(np.where(mask, np.nan, v), df.index, df.columns)`
But for both I'm getting TypeError: unsupported operand type(s) for -: 'str' and 'float'
and TypeError: must be str, not float
I've also tried to apply this only for the column with the outliers but it's not modifying anything
This is what the df looks like
dateRep cases deaths countriesAndTerritories countryterritoryCode popData2018
0 03/05/2020 134.0 4.0 Afghanistan AFG 37172386.0
1 02/05/2020 164.0 4.0 Afghanistan AFG 37172386.0
2 01/05/2020 222.0 NaN Afghanistan AFG 37172386.0
3 30/04/2020 122.0 0.0 Afghanistan AFG 37172386.0
4 29/04/2020 124.0 3.0 Afghanistan AFG 37172386.0
Upvotes: 0
Views: 75
Reputation: 76
You could try something like this (this is to change the "cases" column):
df.loc[abs(df.cases - df.cases.mean())/df.cases.std() > 1, "cases"] = None
However, note that here I have used a Z value of 1 for the "Cases" column, since the largest Z value is 1.63 (instance with index = 2). You are trying to modify values with Z values greater than 2, none of these instances have a Z value greater than 2.
Hope this helps!
Upvotes: 1