Anonymise pandas data frame column

Question

Let's say, I have the following data frame.

Person_info
(Bob, 2)
(John, 1)
(Bek, 10)
(Bob, 6)

I would like to anonymise by just keeping their value.

Person_info
(Person 1, 2)
(Person 2, 1)
(Person 3, 10)
(Person 1, 6)

I got simple way to anonymise here but it can't help what I want to get.

Can any one help with this in Pandas Python?

Anurag Dabas · Accepted Answer

Following this question you can make use of strip() and split() method:

out=df['Person_info'].str.strip('()| ').str.split(',',1,expand=True)

out[0]='Person' + pd.Series(pd.factorize(out[0])[0] + 1).astype(str)

Finally use agg() method:

df['Person_info']=out.agg(tuple,1)

Output of df:

    Person_info
0   (Person1, 2)
1   (Person2, 1)
2   (Person3, 10)
3   (Person1, 6)

Answers (2)