Rocketq
Rocketq

Reputation: 5781

How to make columns in dataframe to be unique?

I wanted to apply one-hot encoding (it isn't important to understand the question) to my dataframe this way:

train = pd.concat([train, pd.get_dummies(train['Canal_ID'])], axis=1, join_axes=[train.index])
train.drop([11,'Canal_ID'],axis=1, inplace = True)

train = pd.concat([train, pd.get_dummies(train['Agencia_ID'])], axis=1, join_axes=[train.index])
train.drop([1382,'Agencia_ID'],axis=1, inplace = True)

Unfortunately, original dataframe had number as values, this is why after getting dummies variables, there are a lot of columns with the same name. How can I make them unique?

Upvotes: 2

Views: 123

Answers (3)

Merlin
Merlin

Reputation: 25629

Try this: get_dummies has a "prefix" method

df = pd.DataFrame({'A': ['a', 'b', 'a'], 'B': ['b', 'a', 'c'],
                    'C': [1, 2, 3]})

pd.get_dummies(df, prefix=['col1', 'col2'])
   C  col1_a  col1_b  col2_a  col2_b  col2_c
0  1       1       0       0       1       0
1  2       0       1       1       0       0
2  3       1       0       0       0       1

Upvotes: 1

jezrael
jezrael

Reputation: 862511

You can set new column names by range with shape:

df.columns = range(df.shape[1])

Sample:

df = pd.DataFrame({'A':[1,2,3],
                   'B':[4,5,6],
                   'C':[7,8,9],
                   'D':[1,3,5],
                   'E':[5,3,6],
                   'F':[7,4,3]})
print (df)
   A  B  C  D  E  F
0  1  4  7  1  5  7
1  2  5  8  3  3  4
2  3  6  9  5  6  3

print (df.shape)
(3, 6)

df.columns = range(df.shape[1])
print (df)
   0  1  2  3  4  5
0  1  4  7  1  5  7
1  2  5  8  3  3  4
2  3  6  9  5  6  3

Upvotes: 1

kiril
kiril

Reputation: 5202

I would append a random number to the original id of the columns.

new_cols = train.columns
new_cols = new_cols.map(lambda x: "{}-{}".format(x, randint(0,100))
train.columns = new_cols

Upvotes: 1

Related Questions