Reputation: 12856
Let's say I have a factor variable with numerous levels and I am trying to group them into several groups.
> levels(dat$years_continuously_insured_order2)
[1] "1" "2" "3" "4" "5" "6" "7" "8" "9" "10" "11" "12" "13" "14" "15" "16" "17" "18"
[19] "19" "20"
> levels(dat$age_of_oldest_driver)
[1] "-16" "1" "15" "16" "17" "18" "19" "20" "21" "22" "23" "24" "25" "26" "27" "28" "29" "30" "31" "32" "33"
[22] "34" "35" "36" "37" "38" "39" "40
I have a script which runs through these variables and groups them into several categories. However, the number of levels could (and usually is) different each time my script runs. Therefore, if my original code to group the variables was the following (see below), it wouldn't be of use if in an hour later, my script runs and the levels are different. Instead of 15 levels, I could now have 25 levels and the values are different, but I still need to group them into specific categories.
dat$years_continuously_insured2 <- NA
dat$years_continuously_insured2[dat$years_continuously_insured %in% levels(dat$years_continuously_insured)[1]] <- NA
dat$years_continuously_insured2[dat$years_continuously_insured %in% levels(dat$years_continuously_insured)[2:3]] <- "1 or less"
dat$years_continuously_insured2[dat$years_continuously_insured %in% levels(dat$years_continuously_insured)[4]] <- "2"
dat$years_continuously_insured2[dat$years_continuously_insured %in% levels(dat$years_continuously_insured)[5:7]] <- "3 +"
dat$years_continuously_insured2 <- factor(dat$years_continuously_insured2)
How can I find a more elegant way to group variables into segments? Are there better ways to do this in R?
Thanks!
Upvotes: 1
Views: 230
Reputation: 263301
You could convert your factor levels in the continuously insured variable to numeric and then cut to your categories and re-factor(). The first step is described in the R-FAQ (to do properly it's a two step process):
dat$years_cont <- factor( cut( as.numeric(as.character(
dat$years_continuously_insured_order2)),
breaks=c(0,2,3, Inf), right=FALSE ),
labels=c( "1 or less", "2", "3 +")
)
#-----------------
> str(dat)
'data.frame': 100 obs. of 2 variables:
$ years_continuously_insured_order2: Factor w/ 20 levels "1","10","11",..: 4 15 19 5 8 4 16 12 12 18 ...
$ years_cont : Factor w/ 3 levels "1 or less","2",..: 3 3 3 3 3 3 3 2 2 3 ...
Upvotes: 2
Reputation: 78590
If your original column is a number, treat it as a number, not a factor. A much easier way to do what you're doing is:
bin.value = function(x) {
ifelse(x <= 1, "1 or less", ifelse(x == 2, "2", "3+"))
}
dat$years_continuously_insured2 = as.factor(bin.value(as.integer(dat$years_continuously_insured)))
Upvotes: 0