Reputation: 4534
How would I create a new data frame and replace the values in a specific column with a single statement?
Say I have the following:
import pandas as pd
import numpy as np
student_ids = ['abc123', 'def321', 'qwe098', 'rty135']
extra_junk = ['whoa', 'hey', 'don\'t touch me', 'junk']
gpas = ['3.1', 'junk', 'NaN', '2.75']
aa = np.array([student_ids, extra_junk, gpas]).transpose()
df = pd.DataFrame(data= aa, columns=['student_id', 'extra_junk', 'gpa'])
>>> df
student_id extra_junk gpa
0 abc123 whoa 3.1
1 def321 hey junk
2 qwe098 don't touch me NaN
3 rty135 junk 2.75
I can do it in two:
df2 = df.copy()
df2['gpa'] = df2['gpa'].replace('junk', 'NaN')
>>> df2
student_id extra_junk gpa
0 abc123 whoa 3.1
1 def321 hey NaN
2 qwe098 don't touch me NaN
3 rty135 junk 2.75
Upvotes: 2
Views: 87
Reputation: 51395
Use the nested dictionary syntax of df.replace
df2 = df.replace({'gpa':{'junk':'NaN'}})
From the docs:
Nested dictionaries, e.g., {‘a’: {‘b’: nan}}, are read as follows: look in column ‘a’ for the value ‘b’ and replace it with nan.
Note that using 'NaN'
will replace it with a string. If you want it to be an actual NaN
, use np.nan
Upvotes: 3
Reputation: 153500
You can use assign
to create a copy and do the replace.
df2 = df.assign(gpa = df.gpa.replace('junk', 'NaN'))
df2
Output:
student_id extra_junk gpa
0 abc123 whoa 3.1
1 def321 hey NaN
2 qwe098 don't touch me NaN
3 rty135 junk 2.75
Upvotes: 2