Reputation: 479
Hi I'm new to python and pandas.
I have extracted the unique values of one of the column using pandas. Now after getting the unique values of the column, which are string.
['Others, Senior Management-Finance, Senior Management-Sales'
'Consulting, Strategic planning, Senior Management-Finance'
'Client Servicing, Quality Control - Product/ Process, Strategic
planning'
'Administration/ Facilities, Business Analytics, Client Servicing'
'Sales & Marketing, Sales/ Business Development/ Account Management,
Sales Support']
I want to replace the string values with the unique integer value.
for simplicity I can give you the dummy input and output.
Input:
Col1
A
A
B
B
B
C
C
Unique df value will come as below
[ 'A' 'B' 'C' ]
after replacing the column should look like this
Col1
1
1
2
2
2
3
3
Please suggest me the way how can I do it by using loop or any other way because I have more than 300
unique values.
Upvotes: 9
Views: 8907
Reputation: 862406
Use pd.factorize
:
df['Col1'] = pd.factorize(df.Col1)[0] + 1
print (df)
Col1
0 1
1 1
2 2
3 2
4 2
5 3
6 3
Another numpy.unique
solution, but slower in huge dataframe:
_,idx = np.unique(df['Col1'],return_inverse=True)
df['Col1'] = idx + 1
print (df)
Col1
0 1
1 1
2 2
3 2
4 2
5 3
6 3
Last you can convert values to categorical
- mainly because less memory usage:
df['Col1'] = pd.factorize(df.Col1)[0]
df['Col1'] = df['Col1'].astype("category")
print (df)
Col1
0 0
1 0
2 1
3 1
4 1
5 2
6 2
print (df.dtypes)
Col1 category
dtype: object
Upvotes: 8