Reputation: 3187
I have such base here.
df = pd.read_csv('c:/1/Autism_Data.arff',na_values="?")
I need to transform columns: "gender", "jundice", "austim" into binar records 0-1.
I would like to see this table like that.
Upvotes: 0
Views: 417
Reputation: 143097
You can map()
values with df['gender'].map({'f':1, 'm':0})
import pandas as pd
df = pd.DataFrame({
'gender':['f','m','m','f', 'f'],
'jundice':['no','no','yes','no','no'],
'austim':['no','yes','yes','yes','no'],
})
#print(df)
df['gender'] = df['gender'].map({'f':1, 'm':0})
df['jundice'] = df['jundice'].map({'yes':1, 'no':0})
df['austim'] = df['austim'].map({'yes':1, 'no':0})
print(df)
Result:
gender jundice austim
0 1 0 0
1 0 0 1
2 0 1 1
3 1 0 1
4 1 0 0
Upvotes: 1
Reputation: 7210
If you'd like to be brief you can use pd.Categorical
. For example,
df['gender'] = pd.Categorical(df.gender).codes
you can extend this to the other desired columns. These will assign the numbers alphabetically - so you ought to pay attention to that and mask otherwise desired results. Alternatively, if you would like some more control you can use LabelEncoder
.
sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
df['gender'] = le.fit_transform(df.gender)
Upvotes: 2