Python Pandas convert multiple string columns to specified integer values

Question

I have a dataframe with thousands of rows, some columns all have ratings like A,B,C,D. I am trying to do some machine learning and would like to give the ratings certain values, Like A=32,B=16,C=4,D=2. I have read some post on using factorize and labelEncoder

I got a simple method to work (while trying to explain the problem) from the link, but would like to know how to use other methods, I don't know how to tell those methods to use certain values, they seem just to put their own values to the data. The method below works if only a few columns need to be transformed.

import pandas as pd

df = pd.DataFrame({'Studentid':['12','40','36'],
               'history':['A','C','C'],
               'math':['B','C','D'],
               'biology':['A','C','B']})

print(df)

    Studentid history math biology
0        12       A    B       A
1        40       C    C       C
2        36       C    D       B


df['history1'] = df['history'].replace(to_replace=['A', 'B', 'C','D'], value=[32, 16, 4,2])
df['math1'] = df['math'].replace(to_replace=['A', 'B', 'C','D'], value=[32, 16, 4,2])
df['biology1'] = df['biology'].replace(to_replace=['A', 'B', 'C','D'], value=[32, 16, 4,2])

    Studentid history math biology  history1  math1  biology1
0        12       A    B       A        32     16        32
1        40       C    A       C         4     32         4
2        36       C    D       B         4      2        16

SeaBean · Accepted Answer

If you need to transform a relatively large number of columns, probably you don't want to quote all the column names one by one in the program codes. You can do it this way:

Assuming the column Studentid is not going to be transformed:

grade_map = {'A': 32, 'B': 16, 'C': 4, 'D': 2}

df_transformed = df.drop('Studentid', axis=1).replace(grade_map).add_suffix('1')
df = df.join(df_transformed)

We exclude the column Studentid in the transformation by dropping the column first by .drop() and then use .replace() to translate the gradings. As such, we will never translate Studentid if in case the student id contains the characters same as the gradings. We add suffix 1 to all transformed columns by using .add_suffix()

After the transformation, we join the original dataframe with these transformed columns by using .join()

Result:

print(df)

  Studentid history math biology  history1  math1  biology1
0        12       A    B       A        32     16        32
1        40       C    C       C         4      4         4
2        36       C    D       B         4      2        16

Python Pandas convert multiple string columns to specified integer values

Answers (1)

Related Questions