Reputation: 123
So essentially I have a data frame with a bunch of columns, some of which I want to keep (stored in to_keep) and some other columns that I want to create categorical variables for using pandas.get_dummies (these are stored in to_change).
However, I can't seem to get the syntax of how to do this down, and all the examples I have seen (i.e here: http://blog.yhat.com/posts/logistic-regression-and-python.html), don't seem to help.
Here's what I have at present:
new_df = df.copy()
dummies= pd.get_dummies(new_df[to_change])
new_df = new_df[to_keep].join(dummies)
return new_df
Any help on where I am going wrong would be appreciated, as the problem I keep running into is that this only adds categorical variables for the first column in to_change.
Upvotes: 1
Views: 510
Reputation: 76297
Didn't understand the problem completely, I must say.
However, say your DataFrae is df
, and you have a list of columns to_make_categorical
.
The DataFrame with the non-categorical columns, is
wo_categoricals = df[[c for c in list(df.columns) if c not in to_make_categorical]]
The DataFrames of the categorical expansions are
categoricals = [pd.get_dummies(df[c], prefix=c) for c in to_make_categorical]
Now you could just concat them horizontally:
pd.concat([wo_categoricals] + categoricals, axis=1)
Upvotes: 2