Reputation: 1
Df <- bball5
str(bball5)
'data.frame': 379 obs. of 9 variables:
$ ID : int 238 239 240 241 242 243 244 245 246 247 ...
$ Sex : Factor w/ 2 levels "female","male": 1 1 1 1 1 1 1 1 1 1 ...
$ Sport : Factor w/ 10 levels "BBall","Field",..: 1 1 1 1 1 1 1 1 1 1
$ Ht : num 196 190 178 185 185 ...
$ Wt : num 78.9 74.4 69.1 74.9 64.6 63.7 75.2 62.3 66.5 62.9 ...
$ BMI : num 20.6 20.7 21.9 21.9 19 ...
$ BMIc : NA NA NA NA NA NA NA NA NA NA ...
$ Sex_f : Factor w/ 1 level "female": 1 1 1 1 1 1 1 1 1 1 ...
$ Sex_m : Factor w/ 1 level "male": NA NA NA NA NA NA NA NA NA NA ...
I would like to class a set of numerical variables within a large dataset of a 1000.
I need to classify BMI into the following ranges:
(<18.50, 18.50-24.99, 24.99-25.00, >=30.00)
and label them respectively as:
"Underweight" "Normal" "Overweight" "Obese"
So as to plot tables to demonstrate relationships that are the separate for:
$ males
$ females
according to sport types.
I also need to confirm that the BMI calculated is correctly done, as I am finding it difficult to create formula within the dataset for a new variable column
$ BMIc.
There are several missing values in variables (NA),within each variable, which are giving me errors if I create a function to calculate the a new variable
bball5$BMIc <- bball5$BMI[bball5$BMI, c(bball5$wt/(bball5$Ht)^2 ]
I am unable to class the BMI variables. I must maintain the ID to match as well.
Upvotes: 0
Views: 2223
Reputation: 3429
I would use cut
to transform BMI into a categorical variable. An example on a random BMI vector:
BMI <- runif(100, 16, 35)
BMIc <- cut(BMI, breaks=c(0, 18.5, 25, 30, +Inf),
labels=c("Underweight", "Normal", "Overweight", "Obese"))
To check the result, you could use aggregate
:
aggregate(BMI, by=list(BMIc), summary)
Finally, the new vector can be included in the data frame with the command df$BMIc <- BMIc
for example...
Upvotes: 0
Reputation: 5660
You can create a variable named BMIclass
and do this to create the 4 categories in it:
bball5$BMIclass <- "Underweight"
bball5[which(bball5$BMI>18.5 & ball5$BMI<24.99), 'BMIclass'] <- "Normal"
bball5[which(bball5$BMI>=24.99 & ball5$BMI<25), 'BMIclass'] <- "Overweight"
bball5[which(bball5$BMI>=30), 'BMIclass'] <- "Obese"
bball5$BMIclass <- as.factor(bball5$BMIc)
As for BMIc
you can do this (below). It will still create some NAs where there are missing values but it will give you the correct BMIc where there is data for it.
bball5$BMIc <- bball5$wt/bball5$Ht^2
Upvotes: 1