Prakhar Rathi
Prakhar Rathi

Reputation: 1076

Creating dummy variables using pd.get_dummies in a for loop in Python

I want to convert a particular categorical variable into dummy variables using pd.get_dummies() for both test and train data so instead of doing it for both separately, I used a for loop. However, the following code does not work and .head() returns the same dataset.

combine = [train_data, test_data]
for dataset in combine:
    dummy_col = pd.get_dummies(dataset['targeted_sex'])
    dataset = pd.concat([dataset, dummy_col], axis = 1)
    dataset.drop('targeted_sex', axis = 1, inplace = True)

train_data.head() # does not change

Even if I use an iterator which traverses the index like this, it still doesn't work.

for i in range(len(combine)):

Can I get some help? Also, Pandas get_dummies() doesn't provide an inplace option.

Upvotes: 1

Views: 2094

Answers (2)

anky
anky

Reputation: 75080

For referencing purposes , I would use a dict:

Create a dictionary of train and test:

combine={'train_data':train_data,'test_data':test_data}

Use this code which uses a dict comprehension:

new_combine={k:pd.concat([dataset, pd.get_dummies(dataset['targeted_sex'])], axis = 1)
                            .drop('targeted_sex',1) for k,dataset in combine.items()}

Print test and train now by referencing the keys:

print(new_combine['train_data']) #same for test

Upvotes: 1

talatccan
talatccan

Reputation: 743

You need to print dataset.head() instead of train_data.head().

You can use this function:

df: dataframe todummy_list: list of column names which will be dummies

def dummy_df(df, todummy_list):
    for x in todummy_list:
        dummies = pd.get_dummies(df[x], prefix=x, dummy_na=False)
        df = df.drop(x, 1)
        df = pd.concat([df, dummies], axis=1)
    return df

Upvotes: 0

Related Questions