Reputation: 49
I have a DataFrame
where one column is grade data. It spans from A+
, A
, A-
etc. all the way down to F
. These are in the form categories. I want to convert them efficiently into numbers, such that the best grade gets the highest number. Since there are 13 grades, A+
should get the value of 13 and F
should get the value of 1
.
For instance (but with categories instead of strings):
grades = ['A+', 'C-', 'F', 'B', 'D-']
students = ['billy', 'bob', 'joe', 'tom', 'jamal']
pd.DataFrame(columns = ['grades'], data = grades, index = students )
I would like to turn the grades1
column of this DataFrame
into numeric values ranging from 1
to 13
, corresponding to the categories of F
and A+
respectively. I'm not really sure how to go about this. A
EDIT: also this is is multiindex dataframe. The first index is the date, the second is the name, then the value.
Upvotes: 1
Views: 856
Reputation: 402573
Most of your problems go away once you declare these values as Categorical items.
s = pd.Series(['C+', 'A+', 'D+', 'D', 'D', 'A+', 'C', 'D+', 'C+', 'A+', 'A-', 'F',
'B', 'D+', 'D-', 'A+', 'A+', 'D-', 'A', 'B-'])
cats = 'A+ A A- B+ B B- C+ C C- D+ D D- F'.split()[::-1]
s = pd.Categorical(s, categories=cats, ordered=True)
s.codes + 1
array([ 7, 13, 4, 3, 3, 13, 6, 4, 7, 13, 11, 1, 9, 4, 2, 13, 13,
2, 12, 8], dtype=int8)
Upvotes: 2
Reputation: 365767
What you probably want to do is build a dict, mapping each letter grade to a value.
You can do this explicitly:
gradevalues = {'A+': 13, 'A': 12, …, 'F': 1}
But it's probably better to do it programmatically, because less repetition means fewer places to make a typo:
grades = 'A+ A A- B+ B B- C+ C C- D+ D D- F'.split()
grades.reverse()
gradevalues = {grade: i for i, grade in enumerate(grades, 1)}
assert gradevalues['F'] == 1
assert gradevalues['A+'] == 13
Upvotes: 3