Replacing values in a data frame in Python

Question

I'm new to python, and trying to learn how to data analysis with it. I have a data frame in python (called "data"). I am looking to recode a variable, GEND, which has three values (1, 2, 3). Using pandas, I read in a csv file using pd.read_csv(). I am trying to replace all instances of "3" in the variable GEND to missing (NaN). However, I can't seem to find out how to do it. So far I've tried a for loop, which doesn't show an error, but doesn't change the variable information:

for value in data.GEND:
if value == 3:
    value = np.nan

I've also tried this, which doesn't show an error, but also doesn't do anything:

data.GEND.loc[3] = np.nan

and this, which works but changes the value of the ID variable to "3", but otherwise correctly changes the value of "3" in the GEND variable to NaN:

data.GEND.replace(to_replace=3, value = nan)

What am I missing here? I'd also like to know how I can do the above but create a new column in the data frame that contains the new information (so I can keep the original values if I mess up).

dting · Accepted Answer

You can use loc to replace the 3's:

df = pd.DataFrame({'GEND':[1,2,1,2,3,1,2,3,1,2,1,2,]})
df.loc[df.GEND == 3, 'GEND'] = np.NaN

Also using where you can obtain the same result:

df.GEND = df.GEND.where(df.GEND != 3)

Replacing values in a data frame in Python

Answers (1)

Related Questions