Reputation:
I'm new to python, and trying to learn how to data analysis with it. I have a data frame in python (called "data"). I am looking to recode a variable, GEND, which has three values (1, 2, 3). Using pandas, I read in a csv file using pd.read_csv(). I am trying to replace all instances of "3" in the variable GEND to missing (NaN). However, I can't seem to find out how to do it. So far I've tried a for loop, which doesn't show an error, but doesn't change the variable information:
for value in data.GEND:
if value == 3:
value = np.nan
I've also tried this, which doesn't show an error, but also doesn't do anything:
data.GEND.loc[3] = np.nan
and this, which works but changes the value of the ID variable to "3", but otherwise correctly changes the value of "3" in the GEND variable to NaN:
data.GEND.replace(to_replace=3, value = nan)
What am I missing here? I'd also like to know how I can do the above but create a new column in the data frame that contains the new information (so I can keep the original values if I mess up).
Upvotes: 2
Views: 2725
Reputation: 39287
You can use loc to replace the 3's:
df = pd.DataFrame({'GEND':[1,2,1,2,3,1,2,3,1,2,1,2,]})
df.loc[df.GEND == 3, 'GEND'] = np.NaN
GEND
0 1
1 2
2 1
3 2
4 NaN
5 1
6 2
7 NaN
8 1
9 2
10 1
11 2
Also using where you can obtain the same result:
df.GEND = df.GEND.where(df.GEND != 3)
Upvotes: 4