Reputation: 1076
I want to convert a particular categorical variable into dummy variables using pd.get_dummies() for both test and train data so instead of doing it for both separately, I used a for loop. However, the following code does not work and .head() returns the same dataset.
combine = [train_data, test_data]
for dataset in combine:
dummy_col = pd.get_dummies(dataset['targeted_sex'])
dataset = pd.concat([dataset, dummy_col], axis = 1)
dataset.drop('targeted_sex', axis = 1, inplace = True)
train_data.head() # does not change
Even if I use an iterator which traverses the index like this, it still doesn't work.
for i in range(len(combine)):
Can I get some help? Also, Pandas get_dummies() doesn't provide an inplace option.
Upvotes: 1
Views: 2094
Reputation: 75080
For referencing purposes , I would use a dict:
Create a dictionary of train and test:
combine={'train_data':train_data,'test_data':test_data}
Use this code which uses a dict comprehension:
new_combine={k:pd.concat([dataset, pd.get_dummies(dataset['targeted_sex'])], axis = 1)
.drop('targeted_sex',1) for k,dataset in combine.items()}
Print test and train now by referencing the keys:
print(new_combine['train_data']) #same for test
Upvotes: 1
Reputation: 743
You need to print dataset.head()
instead of train_data.head()
.
You can use this function:
df: dataframe todummy_list: list of column names which will be dummies
def dummy_df(df, todummy_list):
for x in todummy_list:
dummies = pd.get_dummies(df[x], prefix=x, dummy_na=False)
df = df.drop(x, 1)
df = pd.concat([df, dummies], axis=1)
return df
Upvotes: 0