Reputation: 11
"data" is a data.frame and has 10 numeric variables. I want to make all the variables as categorized variables with 6 percentile groups (under 5%, between 5%~25%, between 25%~50%, between 50%~75%, between 75%~95%, over 95%) I want to make it with a function so I can categorize all the variables all at ones.
I can only do this without a function as below, so I have to repeat the same codes over and over.
m1<- quantile(data$val, 0.05)
m2<- quantile(data$val, 0.25)
m3<- quantile(data$val, 0.5)
m4<- quantile(data$val, 0.75)
m5<- quantile(data$val, 0.95)
data$val[data$val<m1] = "below0.05"
data$val[data$val>= m1& data$val<m2 ] = "0.05to0.25"
data$val[data$val>= m2& data$val<m3 ] = "0.25to0.5"
data$val[data$val>= m3& data$val<m4 ] = "0.5to0.75"
data$val[data$valT>= m4& data$val<m5 ] = "0.75to0.95"
data$val[data$val>= m5] = "upper0.95"
data$val <-as.factor(data$val)
I tried some codes with lapply() and function(data,name)
fun =function(data, name) {
y <-get(name,data)
m1<- quantile(name,data, 0.05)
m2<- quantile(name,data, 0.25)
m3<- quantile(name,data, 0.5)
m4<- quantile(name,data, 0.75)
m5<- quantile(name,data, 0.95)
RB = rbind(m1, m2, m3, m4, m5)
dimnames(RB)[[2]] = "Value"
name$data[ name$data<m1] = "below0.05"
name$data[ name$data>= m1& name$data<m2 ] = "0.05to0.25"
name$data[ name$data>= m2& name$data<m3 ] = "0.25to0.5"
name$data[ name$data>= m3& name$data<m4 ] = "0.5to0.75"
name$data[ name$data>= m4& name$data<m5 ] = "0.75to0.95"
name$data[ name$data>= m5] = "upper0.95"
name$data <-as.factor(name$data)
}
It works only throughout the halfway. I want to know how to make it right. Plus, I want to know how to apply "lapply()" here so that I can categorize all the variables easily. Please, anyone help!
Error in `$<-.data.frame`(`*tmp*`, "name", value = character(0)) :
replacement has 0 rows, data has 301
In addition: Warning messages:
1: Unknown or uninitialised column: 'name'.
Show Traceback
Rerun with Debug
Upvotes: 1
Views: 1325
Reputation: 388797
We can use cut
to divide data into breaks
using quantile
and use lapply
to apply it for multiple columns. So something like this should work for 1st 10 columns.
lapply(df[1:10], function(x) cut(x,
breaks = c(-Inf, quantile(x, c(0.05, 0.25, 0.5, 0.75, 0.95))),
labels = c("below0.05", "0.05to0.25", "0.25to0.5", "0.5to0.75", "0.75to0.95")))
Upvotes: 3