elmo
elmo

Reputation: 1

Categorising numerical and categorical variables into appropriate ranges in R

 Df <- bball5
 str(bball5)
 'data.frame':  379 obs. of 9 variables:
 $ ID         : int  238 239 240 241 242 243 244 245 246 247 ...
 $ Sex        : Factor w/ 2 levels "female","male": 1 1 1 1 1 1 1 1 1 1 ...
 $ Sport      : Factor w/ 10 levels "BBall","Field",..: 1 1 1 1 1 1 1 1 1 1 
 $ Ht         : num  196 190 178 185 185 ...
 $ Wt         : num  78.9 74.4 69.1 74.9 64.6 63.7 75.2 62.3 66.5 62.9 ...
 $ BMI        : num  20.6 20.7 21.9 21.9 19 ...
 $ BMIc       : NA NA NA NA NA NA NA NA NA NA ...
 $ Sex_f      : Factor w/ 1 level "female": 1 1 1 1 1 1 1 1 1 1 ...
 $ Sex_m      : Factor w/ 1 level "male": NA NA NA NA NA NA NA NA NA NA ...

I would like to class a set of numerical variables within a large dataset of a 1000.

I need to classify BMI into the following ranges:

    (<18.50, 18.50-24.99, 24.99-25.00, >=30.00) 

and label them respectively as:

  "Underweight" "Normal" "Overweight" "Obese" 

So as to plot tables to demonstrate relationships that are the separate for:
$ males $ females
according to sport types.

I also need to confirm that the BMI calculated is correctly done, as I am finding it difficult to create formula within the dataset for a new variable column

$ BMIc.

There are several missing values in variables (NA),within each variable, which are giving me errors if I create a function to calculate the a new variable

 bball5$BMIc <- bball5$BMI[bball5$BMI, c(bball5$wt/(bball5$Ht)^2 ]

I am unable to class the BMI variables. I must maintain the ID to match as well.

Upvotes: 0

Views: 2223

Answers (2)

Vincent Guillemot
Vincent Guillemot

Reputation: 3429

I would use cut to transform BMI into a categorical variable. An example on a random BMI vector:

BMI <- runif(100, 16, 35)
BMIc <- cut(BMI, breaks=c(0, 18.5, 25, 30, +Inf), 
            labels=c("Underweight", "Normal", "Overweight", "Obese"))

To check the result, you could use aggregate:

aggregate(BMI, by=list(BMIc), summary)

Finally, the new vector can be included in the data frame with the command df$BMIc <- BMIc for example...

Upvotes: 0

Gaurav Bansal
Gaurav Bansal

Reputation: 5660

You can create a variable named BMIclass and do this to create the 4 categories in it:

bball5$BMIclass <- "Underweight"
bball5[which(bball5$BMI>18.5 & ball5$BMI<24.99), 'BMIclass'] <- "Normal"
bball5[which(bball5$BMI>=24.99 & ball5$BMI<25), 'BMIclass'] <- "Overweight"
bball5[which(bball5$BMI>=30), 'BMIclass'] <- "Obese"
bball5$BMIclass <- as.factor(bball5$BMIc)

As for BMIc you can do this (below). It will still create some NAs where there are missing values but it will give you the correct BMIc where there is data for it.

bball5$BMIc <- bball5$wt/bball5$Ht^2

Upvotes: 1

Related Questions