Reputation: 2385
I have 6 columns in my dataframe. 2 of them have about 3K unique values. When I use get_dummies()
on the entire dataframe or just one those 2 columns what gets returned is the exact same column with 3k values. get_dummies
fails to dummy-fy the bigger columns.
Some columns do get one-hot encoded but the big ones dont.
I wonder if get_dummies only works on sets with small cardinality.
I believe this was also discusses here: Need help with python(pandas) script
Upvotes: 4
Views: 4853
Reputation: 294488
It appears to work as intended for me.
Consider the series s
of random 3 character strings
import pandas as pd
import numpy as np
from string import lowercase
np.random.seed([3,1415])
s = pd.DataFrame(np.random.choice(list(lowercase), (10000, 3))).sum(1)
s.nunique()
7583
Then assign the dataframe df
df = s.str.get_dummies()
df.shape
(10000, 7583)
df.sum(1).describe()
count 10000.0
mean 1.0
std 0.0
min 1.0
25% 1.0
50% 1.0
75% 1.0
max 1.0
dtype: float64
Upvotes: 4