Amira Elsayed Ismail
Amira Elsayed Ismail

Reputation: 9414

How to replace values greater than specific value in dataframe column?

I have a dataset with some outlier in the age field here is the unique values of my data sorted

unique = df_csv['AGE'].unique()
print (sorted(unique))

[21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 79, 126, 140, 149, 152, 228, 235, 267]

How can I replace any value greater than 80 with the mean or median of my Age column?

Upvotes: 2

Views: 4497

Answers (3)

ombk
ombk

Reputation: 2111

median = df_csv['AGE'].median()
# using apply 
df_csv['AGE'].apply(lambda x: median if x>80 else x)

Other method: Here

Upvotes: 0

Quang Hoang
Quang Hoang

Reputation: 150825

Since you want to work with a column in a dataframe, you should resolve to loc:

 # replace `median` with `mean` if you want
 df_csv.loc[df_csv['AGE']>80,'AGE'] = df_csv['AGE'].median()

Upvotes: 4

Dani Mesejo
Dani Mesejo

Reputation: 61930

You could do:

series[series > 80] = series.median()
print(series)

Output

0     21
1     22
2     23
3     24
4     25
      ..
58    52
59    52
60    52
61    52
62    52
Length: 63, dtype: int64

Upvotes: 1

Related Questions