Reputation: 3677
I have a df with a column that looks like this:
id
11
22
22
333
33
333
This column is sensitive data. I want to replace each value with any random number but each random number should be maintain the same number across the same IDs.
For example, I want to make mask the data in the column like so:
id
123
987
987
456
00
456
Note the same IDs have the same value. How do I achieve this? I have thousands of IDs.
Upvotes: 2
Views: 2804
Reputation: 1491
I would suggest something like this (But it will not work properly - it will creates values randomly so new values can repeat themselves for different unique initial values):
from random import randint
df['id_rand'] = df.groupby('id')['id'].transform(lambda x: randint(1,1000))
>>> df
'''
id id_rand
0 11 833
1 22 577
2 22 577
3 333 101
4 33 723
5 333 101
Upvotes: 1
Reputation: 25
My idea:
from random import shuffle
my_col = 'your_sensitive_col_name' # (int type)
initial_unique_vals = df[my_col].unique()
new_values = list(range(0,len(initial_unique_vals))) shuffle(initial_unique_vals)
dict_init_new_values = dict(zip(initial_unique_vals, new_values))
df[my_col] = df[my_col].map(dict_init_new_values)
Upvotes: 2
Reputation: 262484
Here are two options to either generate a categorical (non random, id2
), or a unique random per original ID (id3
). In both case we
can use pandas.factorize
(or alternatively unique
, or pandas.Categorical
).
# enumerated categorical
df['id2'] = pd.factorize(df['id'])[0]
# random categorical
import numpy as np
s,ids = pd.factorize(df['id'])
d = dict(zip(ids, np.random.choice(range(1000), size=len(ids), replace=False)))
df['id3'] = df['id'].map(d)
# alternative 1
ids = df['id'].unique()
d = dict(zip(ids, np.random.choice(range(1000), size=len(ids), replace=False)))
df['id3'] = df['id'].map(d)
# alternative 2
df['id3'] = pd.Categorical(df['id'])
new_ids = np.random.choice(range(1000), size=len(df['id3'].cat.categories), replace=False)
df['id3'] = df['id3'].cat.rename_categories(new_ids)
Output:
id id2 id3
0 11 0 395
1 22 1 428
2 22 1 428
3 333 2 528
4 33 3 783
5 333 2 528
Upvotes: 1