Rajnil Guha
Rajnil Guha

Reputation: 435

How to replace Specific values of a particular column in Pandas Dataframe based on a certain condition?

I have a Pandas dataframe which contains students and percentages of marks obtained by them. There are some students whose marks are shown as greater than 100%. Obviously these values are incorrect and I would like to replace all percentage values which are greater than 100% by NaN.

I have tried on some code but not quite able to get exactly what I would like to desire.

import numpy as np
import pandas as pd

new_DF = pd.DataFrame({'Student' : ['S1', 'S2', 'S3', 'S4', 'S5'],
                       'Percentages' : [85, 70, 101, 55, 120]})

#  Percentages  Student
#0          85       S1
#1          70       S2
#2         101       S3
#3          55       S4
#4         120       S5

new_DF[(new_DF.iloc[:, 0] > 100)] = np.NaN

#  Percentages  Student
#0        85.0       S1
#1        70.0       S2
#2         NaN      NaN
#3        55.0       S4
#4         NaN      NaN

As you can see the code kind of works but it actually replaces all the values in that particular row where Percentages is greater than 100 by NaN. I would only like to replace the value in Percentages column by NaN where its greater than 100. Is there any way to do that?

Upvotes: 3

Views: 525

Answers (4)

Loochie
Loochie

Reputation: 2472

Also,

df.Percentages = df.Percentages.apply(lambda x: np.nan if x>100 else x)

or,

df.Percentages = df.Percentages.where(df.Percentages<100, np.nan)

Upvotes: 2

anky
anky

Reputation: 75130

Try and use np.where:

new_DF.Percentages=np.where(new_DF.Percentages.gt(100),np.nan,new_DF.Percentages)

or

new_DF.loc[new_DF.Percentages.gt(100),'Percentages']=np.nan

print(new_DF)

  Student  Percentages
0      S1         85.0
1      S2         70.0
2      S3          NaN
3      S4         55.0
4      S5          NaN

Upvotes: 3

wafi
wafi

Reputation: 78

import numpy as np
import pandas as pd

new_DF = pd.DataFrame({'Student' : ['S1', 'S2', 'S3', 'S4', 'S5'],
                      'Percentages' : [85, 70, 101, 55, 120]})
#print(new_DF['Student'])
index=-1
for i in new_DF['Percentages']:
    index+=1
    if i > 100:
        new_DF['Percentages'][index] = "nan"




print(new_DF)

Upvotes: 0

heena bawa
heena bawa

Reputation: 828

You can use .loc:

new_DF.loc[new_DF['Percentages']>100, 'Percentages'] = np.NaN

Output:

  Student  Percentages
0      S1         85.0
1      S2         70.0
2      S3          NaN
3      S4         55.0
4      S5          NaN

Upvotes: 1

Related Questions