corvusMidnight
corvusMidnight

Reputation: 648

Replacing multiple string values in a column with numbers in pandas

I am currently working on a data frame in pandas named df. One column contains multiple labels (more than 100, to be exact).

I know how to replace values when there are a smaller amount of values.

For instance, in the typical Titanic example:

titanic.Sex.replace({'male': 0,'female': 1}, inplace=True)

Of course, doing so for 100+ values would be extremely time-consuming. I have seen similar questions, but all answers involve typing the data. Is there a faster way to do this?

Upvotes: 0

Views: 171

Answers (1)

mozway
mozway

Reputation: 262604

I think you're looking for factorize:

df = pd.DataFrame({'col': list('ABCDEBJZACA')})
df['factor'] = df['col'].factorize()[0]

output:

   col  factor
0    A       0
1    B       1
2    D       2
3    C       3
4    E       4
5    B       1
6    J       5
7    Z       6
8    A       0
9    C       3
10   A       0

Upvotes: 1

Related Questions