Reputation: 1596
I got a dataframe with millions of entries, with one of the columns 'TYPE' (string). There is a total of 400 values for this specific column and I want to replace the values with integer id starting from 1 to 400. I also want to export this dictionary 'TYPE' => id for future reference. I tried with to_dict but it did not help. Anyway can do this ?
Upvotes: 2
Views: 780
Reputation: 210852
Option 1: you can use pd.factorize:
df['new'] = pd.factorize(df['str_col'])[0]+1
Option 2: using category dtype:
df['new'] = df['str_col'].astype('category').cat.codes+1
or even better just convert it to categorical dtype:
df['str_col'] = df['str_col'].astype('category')
and when you need to use numbers instead just use category codes:
df['str_col'].cat.codes
thanks to @jezrael for extending the answer - for creating a dictionary:
cats = df['str_col'].cat.categories
d = dict(zip(cats, range(1, len(cats) + 1)))
PS category dtype is very memory-efficient too
Upvotes: 2