pandas get_dummies on high cardinality variables using one hot encoding creates too many new features

Question

I have several high cardinal variables in a dataset and want to convert them into dummies. All of them have more than 500 levels. When I used pandas get_dummies, the matrix got so large and my program crashed.

pd.get_dummies(data, sparse=True, drop_first=True, dummy_na=True)

I don't know better ways to handle high cardinal variables besides using one hot encoding, but it increases the size of the data so much that the memory can't handle it. Does anyone have better solutions?

pandas get_dummies on high cardinality variables using one hot encoding creates too many new features

Answers (1)

Related Questions