Reputation: 599
Originally, I have dataframe df1 which contains a gender column, with values Female and Male. Since I want to work with a temp dataframe, I first copied it. See the code:
df2 = df1
gMap = {'Female': 1, 'Male': 0}
df2['sex']=df2['sex'].map(gMap)
2 problems have occurred:
An a final question is, how to change the column data type along with mapping, for example, in above, to integer.
Upvotes: 1
Views: 210
Reputation: 863226
First for new DataFrame
is necessary DataFrame.copy
for avoid reference to original DataFrame, so changing the df1 will avoid change the df2.
If no match possible problem are trailing whitespaces, so remove them by Series.str.strip
:
df2 = df1.copy()
print (df2['sex'].unique())
gMap = {'Female': 1, 'Male': 0}
df2['sex']=df2['sex'].str.strip().map(gMap)
Will this change the datatype as well?
It depends.
If all unique values in columns are only Female
or Male
(keys in dictionary) then is created new integer column:
df2 = pd.DataFrame({'sex':['Male','Female','Male']})
gMap = {'Female': 1, 'Male': 0}
df2['sex']=df2['sex'].map(gMap)
print (df2)
sex
0 0
1 1
2 0
print (df2.dtypes)
sex int64
dtype: object
If there is more values, get float column, because non matched values return missing values:
df2 = pd.DataFrame({'sex':['Male','Female','Another Val']})
gMap = {'Female': 1, 'Male': 0}
df2['sex']=df2['sex'].map(gMap)
print (df2)
sex
0 0.0
1 1.0
2 NaN
print (df2.dtypes)
sex float64
dtype: object
Upvotes: 2