Reputation: 117
I'm currently working on a pandas.DataFrame
whose I need to convert some categorical variables into dummies.
However, as I construct my pandas.DataFrame
from a proportion of a heavy database, I pretty sure that I'll miss some modalities if I simply use pd.get_dummies
.
Fortunately, I retrieved all the modalities from the features that I need to convert.
I wanted to know if it's possible (using pd.get_dummies
or not) ,to efficiently convert my variables depending on the modalities that I retrieved ?
I looked for a solution, with and without get_dummies but didn't found one.
Thanks :)
Upvotes: 0
Views: 1110
Reputation: 18647
IIUC, you can use Pandas.Categorical
dtype to handle this.
# Setup
np.random.seed(0)
df = pd.DataFrame(np.random.choice(['A', 'B', 'C'], 6), columns=['cat'])
print(df)
[out]
cat
0 A
1 B
2 A
3 B
4 B
5 C
And running pandas.get_dummies
on this yields...
pd.get_dummies(df['cat'])
[out]
A B C
0 1 0 0
1 0 1 0
2 1 0 0
3 0 1 0
4 0 1 0
5 0 0 1
Now cast this Series
to categorical
dtype, and pass in the list of known categories...
categories = ['A', 'B', 'C', 'D', 'E']
df['cat'] = pd.Categorical(df['cat'], categories=categories)
pd.get_dummies(df['cat'])
[out]
A B C D E
0 1 0 0 0 0
1 0 1 0 0 0
2 1 0 0 0 0
3 0 1 0 0 0
4 0 1 0 0 0
5 0 0 1 0 0
Upvotes: 3