Jeremy Teitelbaum
Jeremy Teitelbaum

Reputation: 196

Right way to convert levels of column in a dataframe to numeric values in R?

I have a dataframe with a column that contains levels "Excellent, Very Good, Good, Fair, Poor." I would like to average these values, and work with them in other ways, by assigning the value 5 to "Excellent", 4 to "Very Good", and so on.

My various attempts are confounded by the fact that the default assignment of numerical values seems to take the levels in alphabetical order, so that "Excellent" is 1, "Fair" is 2, and so on.

Thanks for the help.

Upvotes: 0

Views: 246

Answers (2)

symbiotic
symbiotic

Reputation: 373

Do you need it to be an ordered factor? If so, using factor maybe your best bet.

Sample data

column <- c("Excellent", "Very Good", "Good", "Fair", "Poor",
        "Good", "Fair", "Poor")


col.f <- factor(column,
            levels = c("Poor","Fair" , "Good" , "Very Good", "Excellent"),
            labels = c("Poor","Fair" , "Good" , "Very Good", "Excellent"),
            ordered = TRUE)

col.f
[1] Excellent Very Good Good      Fair      Poor      Good      Fair      Poor     
Levels: Poor < Fair < Good < Very Good < Excellent

Then you can call as.numeric(col.f) to get numeric values.

Upvotes: 2

Paul Hiemstra
Paul Hiemstra

Reputation: 60944

I'd use a named vector as lookup table:

options = c('Excellent' = 5, 'Very Good' = 4, 'Good' = 3, 'Fair' = 2, 'Poor' = 1)
df = data.frame(grade = sample(names(options), 100, replace = TRUE))
head(df)
      grade
1 Very Good
2      Good
3 Excellent
4 Very Good
5      Fair
6      Good

df = within(df, {
    grade_numeric = options[grade]
})
head(df)
      grade grade_numeric
1 Very Good             1
2      Good             3
3 Excellent             5
4 Very Good             1
5      Fair             4
6      Good             3

Upvotes: 2

Related Questions