Reputation: 9414
I have a dataset with some outlier in the age field here is the unique values of my data sorted
unique = df_csv['AGE'].unique()
print (sorted(unique))
[21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 79, 126, 140, 149, 152, 228, 235, 267]
How can I replace any value greater than 80 with the mean or median of my Age column?
Upvotes: 2
Views: 4497
Reputation: 2111
median = df_csv['AGE'].median()
# using apply
df_csv['AGE'].apply(lambda x: median if x>80 else x)
Other method: Here
Upvotes: 0
Reputation: 150825
Since you want to work with a column in a dataframe, you should resolve to loc
:
# replace `median` with `mean` if you want
df_csv.loc[df_csv['AGE']>80,'AGE'] = df_csv['AGE'].median()
Upvotes: 4
Reputation: 61930
You could do:
series[series > 80] = series.median()
print(series)
Output
0 21
1 22
2 23
3 24
4 25
..
58 52
59 52
60 52
61 52
62 52
Length: 63, dtype: int64
Upvotes: 1