Shew
Shew

Reputation: 1596

pandas dataframe category codes from two columns

I got a pandas dataframe where two columns correspond to names of people. The columns are related and the same name means same person. I want to assign the category code such that it is valid for the whole "name" space.

For example my data frame is

df = pd.DataFrame({"P1":["a","b","c","a"], "P2":["b","c","d","c"]})

>>> df
  P1 P2
0  a  b
1  b  c
2  c  d
3  a  c

I want it to be replaced by the corresponding category codes, such that

>>> df
   P1  P2
0   1   2
1   2   3
2   3   4
3   1   3

The categories are in fact derived from the concatenated array ["a","b","c","d"] and applied on individual columns seperatly. How can I achive this ?.

Upvotes: 4

Views: 1616

Answers (2)

jezrael
jezrael

Reputation: 863351

Use:

print (df.stack().rank(method='dense').astype(int).unstack())
   P1  P2
0   1   2
1   2   3
2   3   4
3   1   3

EDIT:

For more general solution I used another answer, because problem with duplicates in index:

df = pd.DataFrame({"P1":["a","b","c","a"],
                   "P2":["b","c","d","c"],
                   "A":[3,4,5,6]}, index=[2,2,3,3])

print (df)
   A P1 P2
2  3  a  b
2  4  b  c
3  5  c  d
3  6  a  c

cols = ['P1','P2']
df[cols] = (pd.factorize(df[cols].values.ravel())[0]+1).reshape(-1, len(cols))
print (df)
   A  P1  P2
2  3   1   2
2  4   2   3
3  5   3   4
3  6   1   3

Upvotes: 2

Zero
Zero

Reputation: 77017

You can do

In [465]: pd.DataFrame((pd.factorize(df.values.ravel())[0]+1).reshape(df.shape), 
                       columns=df.columns)
Out[465]:
   P1  P2
0   1   2
1   2   3
2   3   4
3   1   3

Upvotes: 2

Related Questions