Reputation: 47
I have a dataframe with thousands of rows, some columns all have ratings like A,B,C,D. I am trying to do some machine learning and would like to give the ratings certain values, Like A=32,B=16,C=4,D=2. I have read some post on using factorize and labelEncoder
I got a simple method to work (while trying to explain the problem) from the link, but would like to know how to use other methods, I don't know how to tell those methods to use certain values, they seem just to put their own values to the data. The method below works if only a few columns need to be transformed.
import pandas as pd
df = pd.DataFrame({'Studentid':['12','40','36'],
'history':['A','C','C'],
'math':['B','C','D'],
'biology':['A','C','B']})
print(df)
Studentid history math biology
0 12 A B A
1 40 C C C
2 36 C D B
df['history1'] = df['history'].replace(to_replace=['A', 'B', 'C','D'], value=[32, 16, 4,2])
df['math1'] = df['math'].replace(to_replace=['A', 'B', 'C','D'], value=[32, 16, 4,2])
df['biology1'] = df['biology'].replace(to_replace=['A', 'B', 'C','D'], value=[32, 16, 4,2])
Studentid history math biology history1 math1 biology1
0 12 A B A 32 16 32
1 40 C A C 4 32 4
2 36 C D B 4 2 16
Upvotes: 0
Views: 320
Reputation: 23237
If you need to transform a relatively large number of columns, probably you don't want to quote all the column names one by one in the program codes. You can do it this way:
Assuming the column Studentid
is not going to be transformed:
grade_map = {'A': 32, 'B': 16, 'C': 4, 'D': 2}
df_transformed = df.drop('Studentid', axis=1).replace(grade_map).add_suffix('1')
df = df.join(df_transformed)
We exclude the column Studentid
in the transformation by dropping the column first by .drop()
and then use .replace()
to translate the gradings. As such, we will never translate Studentid
if in case the student id contains the characters same as the gradings. We add suffix 1
to all transformed columns by using .add_suffix()
After the transformation, we join the original dataframe with these transformed columns by using .join()
Result:
print(df)
Studentid history math biology history1 math1 biology1
0 12 A B A 32 16 32
1 40 C C C 4 4 4
2 36 C D B 4 2 16
Upvotes: 1