JBSH
JBSH

Reputation: 117

Is there a way to convert categorical variable into dummy with a dict with Pandas?

I'm currently working on a pandas.DataFrame whose I need to convert some categorical variables into dummies.

However, as I construct my pandas.DataFrame from a proportion of a heavy database, I pretty sure that I'll miss some modalities if I simply use pd.get_dummies.

Fortunately, I retrieved all the modalities from the features that I need to convert.

I wanted to know if it's possible (using pd.get_dummies or not) ,to efficiently convert my variables depending on the modalities that I retrieved ?

I looked for a solution, with and without get_dummies but didn't found one.

Thanks :)

Upvotes: 0

Views: 1110

Answers (1)

Chris Adams
Chris Adams

Reputation: 18647

IIUC, you can use Pandas.Categorical dtype to handle this.

Example

# Setup
np.random.seed(0)
df = pd.DataFrame(np.random.choice(['A', 'B', 'C'], 6), columns=['cat'])
print(df)

[out]

  cat
0   A
1   B
2   A
3   B
4   B
5   C

And running pandas.get_dummies on this yields...

pd.get_dummies(df['cat'])

[out]

   A  B  C
0  1  0  0
1  0  1  0
2  1  0  0
3  0  1  0
4  0  1  0
5  0  0  1    

Now cast this Series to categorical dtype, and pass in the list of known categories...

categories = ['A', 'B', 'C', 'D', 'E']
df['cat'] = pd.Categorical(df['cat'], categories=categories)

pd.get_dummies(df['cat'])

[out]

   A  B  C  D  E
0  1  0  0  0  0
1  0  1  0  0  0
2  1  0  0  0  0
3  0  1  0  0  0
4  0  1  0  0  0
5  0  0  1  0  0

Upvotes: 3

Related Questions