Reputation: 683
I have a data frame where each column is of type factor and has over 3000levels. Is there a way where I can replace each level with a numeric value. Consider the inbuilt data frame InsectSprays
> str(InsectSprays)
'data.frame': 72 obs. of 2 variables:
$ count: num 10 7 20 14 14 12 10 23 17 20 ...
$ spray: Factor w/ 6 levels "A","B","C","D",..: 1 1 1 1 1 1 1 1 1 1 ...
The replacement should be as follows:
A=1,B=2,C=3,D=4,E=5,F=6.
If there are 3000 levels:
"USA"=1,"UK"=2....,France="3000".
The solution should automatically detect the levels(Ex: 3000),then replace each level starting from 1 to 3000.
Upvotes: 3
Views: 16547
Reputation: 11
Based on this tutorial (https://statisticsglobe.com/how-to-convert-a-factor-to-numeric-in-r/), I have used the following code to convert factor levels into specific numbers:
levels(InsectSprays$spray) # to check the order levels are stored
levels(InsectSprays$spray) <- c(0, 1, 2, 3, 4, 5) # assign the number I want to each level
InsectSprays$spray <- as.numeric(as.character(InsectSprays$spray)) # to change from factor to numeric
Upvotes: 1
Reputation: 93761
Factor variables already have underlying numeric values corresponding to each factor level. You can see this as follows:
as.numeric(InsectSprays$spray)
or
x = factor(c("A","D","B","G"))
as.numeric(x)
If you want to add specific numeric values corresponding to each level, you can, for example, merge in those values from a lookup table:
# Create a lookup table with the numeric values you want to correspond to each level of spray
lookup = data.frame(spray=levels(InsectSprays$spray), sprayNumeric=c(5,4,1,2,3,6))
# Merge lookup values into your data frame
InsectSprays = merge(InsectSprays, lookup, by="spray")
Upvotes: 6
Reputation: 640
For the InsectSprays
example, you can use:
levels(InsectSprays$spray) <- 1:6
Should generalize to your problem.
Upvotes: 7