Reputation: 45
I don't understand the difference between as.numeric and as.factor in R. When do I want to use each?
Example code:
data2$Response <- as.factor(data2$Response)
data2$VOT <- as.factor(data2$VOT)
data2$Block <- as.factor(data2$Block)
What will this do, exactly?
Upvotes: 1
Views: 12369
Reputation: 3660
Factors (with as.factor
) are variables that have discrete values, which may or may not be ordered. In other areas of science outside R they're often called categorical values. For example North South East and West could be factors.
Numerics (with as.numeric
) are numbers, with infinite other numbers between them. So for example 5 is a number, as is 6, but so are 5.01, 5.001, 5.0001 etc.
To build a reproducible example similar to yours
data2 <- data.frame(numbers = c(1,2,3,4), text = c("one", "two", "three", "four"))
numbers text
1 1 one
2 2 two
3 3 three
4 4 four
I can use the numbers column to do math:
library(dplyr)
data2 %>%
mutate(square = numbers * numbers)
numbers text square
1 1 one 1
2 2 two 4
3 3 three 9
4 4 four 16
If I convert numbers to a factor, using as.factor
though
data2$numbers <- as.factor(data2$numbers)
I'll no longer be able to do math (like squaring) using the values in data2$numbers
because they're not numeric anymore. They're factor levels named 1, 2, 3, 4, not the numbers 1, 2, 3, 4. They could just as easily be named North South East and West, and West * West doesn't make any sense.
So to sum up, you'll want to use as.numeric when whatever you're passing are actually numbers, but perhaps coded as strings ("1", "2", "3", "4"
) or something you'd like to represent as numbers (in the case of TRUE or FALSE values for example). You'd like to use as.factor
when you'd like to convert whatever you're passing to named categories, which may or may not have an order.
Does that answer your question?
Upvotes: 2