Reputation: 7235
I have a dafarame like the following:
df
Name1 Name2
0 John Jack
1 John Albert
2 Jack Eva
3 Albert Sara
4 Eva Sara
I want to assign to each a name a unique ID
. So:
df
Name1 Name2 ID1 ID2
0 John Jack 0 1
1 John Albert 0 2
2 Jack Eva 1 3
3 Albert Sara 2 5
4 Eva Sara 3 5
Upvotes: 4
Views: 213
Reputation: 78690
If it's not important which name gets which number, you can also consider
df.join(df.stack().astype('category').cat.codes.unstack()
.rename(columns=lambda c: c.replace('Name', 'ID')))
which produces
Name1 Name2 ID1 ID2
0 John Jack 3 2
1 John Albert 3 0
2 Jack Eva 2 1
3 Albert Sara 0 4
4 Eva Sara 1 4
Upvotes: 1
Reputation: 862681
First flatten values by numpy.ravel
and reshape by original df
, use DataFrame constructor and create columns names, last join
to original:
df1 = pd.DataFrame(pd.factorize(df.values.ravel())[0].reshape(df.shape))
df1.columns = ['ID{}'.format(x+1) for x in range(len(df1.columns))]
print (df1)
ID1 ID2
0 0 1
1 0 2
2 1 3
3 2 4
4 3 4
df = df.join(df1)
print (df)
Name1 Name2 ID1 ID2
0 John Jack 0 1
1 John Albert 0 2
2 Jack Eva 1 3
3 Albert Sara 2 4
4 Eva Sara 3 4
Create MultiIndex Series
by stack
, create id
s by factorize
and for DataFrame
unstack
, then rename
columns and add to original by join
:
s = df.stack()
df = df.join(pd.Series(pd.factorize(s)[0], index=s.index)
.unstack()
.rename(columns=lambda x: x.replace('Name','ID')))
print (df)
Name1 Name2 ID1 ID2
0 John Jack 0 1
1 John Albert 0 2
2 Jack Eva 1 3
3 Albert Sara 2 4
4 Eva Sara 3 4
Similar alternative:
s = df.stack()
s[:] = pd.factorize(s)[0]
df = df.join(s.unstack().rename(columns=lambda x: x.replace('Name','ID')))
print (df)
Name1 Name2 ID1 ID2
0 John Jack 0 1
1 John Albert 0 2
2 Jack Eva 1 3
3 Albert Sara 2 4
4 Eva Sara 3 4
Upvotes: 3