Shew
Shew

Reputation: 1596

replace pandas dataframe with a unique id

I got a dataframe with millions of entries, with one of the columns 'TYPE' (string). There is a total of 400 values for this specific column and I want to replace the values with integer id starting from 1 to 400. I also want to export this dictionary 'TYPE' => id for future reference. I tried with to_dict but it did not help. Anyway can do this ?

Upvotes: 2

Views: 780

Answers (1)

MaxU - stand with Ukraine
MaxU - stand with Ukraine

Reputation: 210852

Option 1: you can use pd.factorize:

df['new'] = pd.factorize(df['str_col'])[0]+1

Option 2: using category dtype:

df['new'] = df['str_col'].astype('category').cat.codes+1

or even better just convert it to categorical dtype:

df['str_col'] = df['str_col'].astype('category')

and when you need to use numbers instead just use category codes:

df['str_col'].cat.codes

thanks to @jezrael for extending the answer - for creating a dictionary:

cats = df['str_col'].cat.categories
d = dict(zip(cats, range(1, len(cats) + 1)))

PS category dtype is very memory-efficient too

Upvotes: 2

Related Questions