Adding unique identifiers to duplicate values in pandas dataframe

Question

I would like to create unique identifiers for values that are duplicates. Values that are duplicates are only 0's. The idea is to convert each zero to zero plus its position (0+1 for first row, 0+2 for second row etc). However the problem is the column also has other non duplicate values.

I have written this line of code to try and convert the zero values as stated but I am getting this error message

TypeError: ufunc 'add' did not contain a loop with signature matching types dtype('

Here is my code

seller_customer['customer_id'] = np.where(seller_customer['customer_id']==0, seller_customer['customer_id'] + seller_customer.groupby(['customer_id']).cumcount().replace('0',''))

Here is a sample of my data

{0: '7e468d618e16c6e1373fb2c4a522c969',
 1: '1c14a115bead8a332738c5d7675cca8c',
 2: '434dee65d973593dbb8461ba38202798',
 3: '4bbeac9d9a22f0628ba712b90862df28',
 4: '578d5098cbbe40771e1229fea98ccafd',
 5:  0,
 6:  0,
 7:  0}

arinarmo · Accepted Answer

You can do something like this:

    from pandas.util import hash_pandas_object
    import numpy as np
    df.x = np.where(df.x == 0, hash_pandas_object(df.x), df.x)
    df

Output:

                                      x
    0  7e468d618e16c6e1373fb2c4a522c969
    1  1c14a115bead8a332738c5d7675cca8c
    2  434dee65d973593dbb8461ba38202798
    3  4bbeac9d9a22f0628ba712b90862df28
    4  578d5098cbbe40771e1229fea98ccafd
    5                593769213749726025
    6              14559158595676751865
    7               4575103004772269825

They won't be sequential like the index but they will be unique (almost certainly, unless you encounter a hash collision)

Adding unique identifiers to duplicate values in pandas dataframe

Answers (2)

Related Questions