Python: Copy panda and replace specific column value in a single step

Question

How would I create a new data frame and replace the values in a specific column with a single statement?

Say I have the following:

import pandas as pd
import numpy as np

student_ids = ['abc123', 'def321', 'qwe098', 'rty135']
extra_junk  = ['whoa', 'hey', 'don\'t touch me', 'junk']
gpas        = ['3.1', 'junk', 'NaN', '2.75']
aa          = np.array([student_ids, extra_junk, gpas]).transpose()

df = pd.DataFrame(data= aa, columns=['student_id', 'extra_junk', 'gpa'])

>>> df
  student_id      extra_junk   gpa
0     abc123            whoa   3.1
1     def321             hey  junk
2     qwe098  don't touch me   NaN
3     rty135            junk  2.75

I can do it in two:

df2 = df.copy()
df2['gpa'] = df2['gpa'].replace('junk', 'NaN')

>>> df2
  student_id      extra_junk   gpa
0     abc123            whoa   3.1
1     def321             hey   NaN
2     qwe098  don't touch me   NaN
3     rty135            junk  2.75

sacuL · Accepted Answer

Use the nested dictionary syntax of df.replace

df2 = df.replace({'gpa':{'junk':'NaN'}})

From the docs:

Nested dictionaries, e.g., {‘a’: {‘b’: nan}}, are read as follows: look in column ‘a’ for the value ‘b’ and replace it with nan.

Note that using 'NaN' will replace it with a string. If you want it to be an actual NaN, use np.nan

Python: Copy panda and replace specific column value in a single step

Answers (2)

Related Questions