Eisen
Eisen

Reputation: 1897

Anonymizing column names

I have a dataframe like so

IsCool IsTall IsHappy Target
0      1      0       1
1      1      0       0
0      1      0       0
1      0      1       1

I want to anonymize the column names except for target. How can I do this?

Expected output:

col1   col2   col3    Target
0      1      0       1
1      1      0       0
0      1      0       0
1      0      1       1

Source dataframe :

import pandas as pd

df = pd.DataFrame({"IsCool": [0, 1, 0, 1], 
                   "IsTall": [1, 1, 1, 0], 
                   "IsHappy": [0, 0, 0, 1], 
                   "Target": [1, 0, 0, 1]})

Upvotes: 0

Views: 90

Answers (3)

Laurent B.
Laurent B.

Reputation: 2273

Proposed code :

You can pass a dict to the rename() Pandas function with a dict like this in parameters :

columns={'IsCool': 'col0', 'IsTall': 'col1', 'IsHappy': 'col2'}

This dict is obtained by using of a zip function : dict(zip(keys, values))

import pandas as pd

df = pd.DataFrame({"IsCool": [0, 1, 0, 1], 
                   "IsTall": [1, 1, 1, 0], 
                   "IsHappy": [0, 0, 0, 1], 
                   "Target": [1, 0, 0, 1]})


df = df.rename(columns = dict(zip(df.columns.drop('Target'), 
                                  ["col%s"%i for i in range(len(df.columns)-1)])))

print(df)

Result :

   col0  col1  col2  Target
0     0     1     0       1
1     1     1     0       0
2     0     1     0       0
3     1     0     1       1

Upvotes: 0

mozway
mozway

Reputation: 262484

You can use:

# get all columns except excluded ones (here "Target")
cols = df.columns.difference(['Target'])
# give a new name 
names = 'col' + pd.Series(range(1, len(cols)+1), index=cols).astype(str)

out = df.rename(columns=names)

Output:

   col1  col2  col3  Target
0     0     1     0       1
1     1     1     0       0
2     0     1     0       0
3     1     0     1       1

Upvotes: 1

Chrysophylaxs
Chrysophylaxs

Reputation: 6583

What about:

cols = {
    col: f"col{i + 1}" if col != "Target" else col
    for i, col in enumerate(df.columns)
}

out = df.rename(columns=cols)
   col1  col2  col3  Target
0     0     1     0       1
1     1     1     0       0
2     0     1     0       0
3     1     0     1       1

You can also do it in place:

cols = [
    f"col{i + 1}" if col != "Target" else col
    for i, col in enumerate(df.columns)
]

df.columns = cols

Upvotes: 2

Related Questions