Reputation: 608
Let's say, I have the following data frame.
Person_info
(Bob, 2)
(John, 1)
(Bek, 10)
(Bob, 6)
I would like to anonymise by just keeping their value.
Person_info
(Person 1, 2)
(Person 2, 1)
(Person 3, 10)
(Person 1, 6)
I got simple way to anonymise here but it can't help what I want to get.
Can any one help with this in Pandas Python?
Upvotes: 2
Views: 540
Reputation: 15608
Cast your names
to category and get the category codes ;)
import pandas as pd
dataf = pd.DataFrame(
[('Bob', 2),
('John', 1),
('Bek', 10),
('Bob', 6)], columns=['name','valuex'])
dataf["name"] = dataf["name"].astype("category").cat.codes.map(lambda x: f"Person {x}")
print(dataf)
name valuex
0 Person 1 2
1 Person 2 1
2 Person 0 10
3 Person 1 6
…
dataf["Person_info"] = [info for info in dataf.itertuples(index=False, name=None)]
name valuex Person_info
0 Person 1 2 (Person 1, 2)
1 Person 2 1 (Person 2, 1)
2 Person 0 10 (Person 0, 10)
3 Person 1 6 (Person 1, 6)
Upvotes: 1
Reputation: 24322
Following this question you can make use of strip()
and split()
method:
out=df['Person_info'].str.strip('()| ').str.split(',',1,expand=True)
Then use factorize()
method as per by this answer:
out[0]='Person' + pd.Series(pd.factorize(out[0])[0] + 1).astype(str)
Finally use agg()
method:
df['Person_info']=out.agg(tuple,1)
Output of df
:
Person_info
0 (Person1, 2)
1 (Person2, 1)
2 (Person3, 10)
3 (Person1, 6)
Upvotes: 1