Shew
Shew

Reputation: 1596

get category type from codes of merged columns

I created unique numeric codes from two columns of a dataframe. Now, I would like to find the corresponding mapping between numeric codes to original values.

For example,

df = pd.DataFrame({"P1":["a","b","c","a"],
                   "P2":["b","c","d","c"],
                   "A":[3,4,5,6]}, index=[2,2,3,3])

print (df)
   A P1 P2
2  3  a  b
2  4  b  c
3  5  c  d
3  6  a  c

cols = ['P1','P2']
df[cols] = (pd.factorize(df[cols].values.ravel())[0]+1).reshape(-1, len(cols))
print (df)
   A  P1  P2
2  3   1   2
2  4   2   3
3  5   3   4
3  6   1   3

Now, I want to get the mapping as a dictionry

a => 1
b => 2
c => 3
d => 4

How can I get it ?

Upvotes: 2

Views: 102

Answers (2)

Alex
Alex

Reputation: 19124

Suggestion: don't do all the crazy stuff to transform the DataFrame in the first place. Create the mapping then apply it:

orig = pd.unique(df[cols].values.flatten())
code_map = dict(zip(orig, np.arange(orig.size)))
df[cols] = df[cols].applymap(code_map.__getitem__)

code_map  # returns {'a': 0, 'b': 1, 'c': 2, 'd': 3}

df # returns

A P1 P2
2  3  a  b
2  4  b  c
3  5  c  d
3  6  a  c

Upvotes: 1

jezrael
jezrael

Reputation: 863361

You can use indexing for expand first array from factorize, zip and convert to dict:

cols = ['P1','P2']
a = (pd.factorize(df[cols].values.ravel()))

d = dict(zip(a[1][a[0]], a[0]+1))
print (d)
{'d': 4, 'b': 2, 'c': 3, 'a': 1}

df[cols] = (a[0]+1).reshape(-1, len(cols))
print (df)
   A  P1  P2
2  3   1   2
2  4   2   3
3  5   3   4
3  6   1   3

Detail:

print (a)
(array([0, 1, 1, 2, 2, 3, 0, 2], dtype=int64), array(['a', 'b', 'c', 'd'], dtype=object))

print (a[1][a[0]])
['a' 'b' 'b' 'c' 'c' 'd' 'a' 'c']

print (a[0] + 1)
[1 2 2 3 3 4 1 3]

Upvotes: 1

Related Questions