Hiwot
Hiwot

Reputation: 608

Anonymise pandas data frame column

Let's say, I have the following data frame.

Person_info
(Bob, 2)
(John, 1)
(Bek, 10)
(Bob, 6)

I would like to anonymise by just keeping their value.

Person_info
(Person 1, 2)
(Person 2, 1)
(Person 3, 10)
(Person 1, 6)

I got simple way to anonymise here but it can't help what I want to get.

Can any one help with this in Pandas Python?

Upvotes: 2

Views: 540

Answers (2)

Prayson W. Daniel
Prayson W. Daniel

Reputation: 15608

Cast your names to category and get the category codes ;)

import pandas as pd

dataf = pd.DataFrame(
[('Bob', 2),
('John', 1),
('Bek', 10),
('Bob', 6)], columns=['name','valuex'])

dataf["name"] = dataf["name"].astype("category").cat.codes.map(lambda x: f"Person {x}")

print(dataf)

output

       name  valuex
0  Person 1       2
1  Person 2       1
2  Person 0      10
3  Person 1       6

Update:

…
dataf["Person_info"] = [info for info in dataf.itertuples(index=False, name=None)]

Output:


       name  valuex     Person_info
0  Person 1       2   (Person 1, 2)
1  Person 2       1   (Person 2, 1)
2  Person 0      10  (Person 0, 10)
3  Person 1       6   (Person 1, 6)

Upvotes: 1

Anurag Dabas
Anurag Dabas

Reputation: 24322

Following this question you can make use of strip() and split() method:

out=df['Person_info'].str.strip('()| ').str.split(',',1,expand=True)

Then use factorize() method as per by this answer:

out[0]='Person' + pd.Series(pd.factorize(out[0])[0] + 1).astype(str)

Finally use agg() method:

df['Person_info']=out.agg(tuple,1)

Output of df:

    Person_info
0   (Person1, 2)
1   (Person2, 1)
2   (Person3, 10)
3   (Person1, 6)

Upvotes: 1

Related Questions