Reputation: 313
I would like to create class intervals in the histogram according to the Body mass index (BMI) classification, and color the columns. The categories are:
Underweight (Severe thinness) < 16.0 -> color: red
Underweight (Moderate thinness) 16.0 – 16.9 -> color: orange
Underweight (Mild thinness) 17.0 – 18.4 -> color: pink
Normal range 18.5 – 24.9 -> color: green
Overweight (Pre-obese) 25.0 – 29.9 -> color: blue
Obese (Class I) 30.0 – 34.9 -> color: pink
Obese (Class II) 35.0 – 39.9 -> color: orange
Obese (Class III) ≥ 40.0 -> color: red
I tried the code below, but it returned the density (y-axis) and the x-axis does not contain the class ranges properly. How to plot the frequency in y-axis and the class intervals limits in histogram?
Height <- c(1.72, 1.86, 2.1, 1.7, 1.6, 1.67, 1.59, 1.88, 1.7, 1.72, 1.9, 1.88,
1.59, 1.55, 1.91, 1.61, 1.82, 1.66, 1.77, 1.74)
Weight <- c(77, 79, 102, 70, 63, 62, 55, 89, 88, 88, 128, 100, 55, 60, 79, 59,
57, 70, 72, 74)
BMI <- Weight/Height^2
class_range <- c(16, 16.9, 18.4, 24.9, 29.9, 34.9, 39.9)
hist(BMI, freq=TRUE, main="", breaks=class_range,
col=c("red", "orange", "pink", "green", "blue", "pink", "orange", "red"))
Upvotes: 0
Views: 63
Reputation: 72583
Probably you are looking for a bar chart, which is often confused with a histogram.
First, to get the correct class_ranges, you can cut
BMI along the lower bounds, and add 0
and Inf
. Then cut
BMI along class_range and barplot
the table
.
> cut(BMI, breaks=class_range) |> table() |>
+ barplot(col=c("red", "orange", "pink", "green", "blue", "pink", "orange", "red"))
If you use a named classes
vector, i.e. where the breaks are named,
> classes <- c('nul'=0,
+ 'Underweight\n(Severe thinness)\n<16.0'=16,
+ 'Underweight\n(Moderate thinness)\n16.0 – 16.9'=17,
+ 'Underweight\n(Mild thinness)\n17.0 – 18.4'=18.5,
+ 'Normal range\n18.5 – 24.9'=25,
+ 'Overweight\n(Pre-obese)\n25.0 – 29.9'=30,
+ 'Obese\n(Class I)\n30.0 – 34.9'=35,
+ 'Obese\n(Class II)\n35.0 – 39.9'=40,
+ 'Obese\n(Class III)\n≥ 40.0'=Inf)
you can use the labels=
argument of cut
and make it look a little more sophisticated. (We use the padj.
parameter to shift the names of the bars a little down, which throws a warning not sure why.) To illustrate this, I use other simulated data below.
> cut(bmi, breaks=classes, labels=names(classes)[-1]) |> table() |>
+ barplot(col=c("red", "orange", "pink", "green", "blue", "pink",
+ "orange", "red"), border=col, padj=.5, cex.names=.8,
+ ylab='Frequency') +
+ mtext('BMI', 1, 3.5)
Alternatively, explicitly state freq=TRUE
in histogram
, which might indicate why freq
per default is deactivated in such cases.
> hist(bmi, freq=TRUE, main="", breaks=class_range,
+ col=c("red", "orange", "pink", "green", "blue", "pink", "orange", "red"))
Warning message:
In plot.histogram(r, freq = freq1, col = col, border = border, angle = angle, :
the AREAS in the plot are wrong -- rather use 'freq = FALSE'
Data:
set.seed(42)
bmi <- rgamma(1e3, 38, 1.42)
Upvotes: 0
Reputation: 18493
By categorizing the BMI values into groups, you are effectively creating a categorical variable. Histograms are not the best plot to visualise this type of data. Try a barchart, as shown blow using base R and the ggplot2 package.
bmi.labels <- c("Underweight (severe)", "Underweight (moderate)", "Underweight (mild) ",
"Normal",
"Overweight",
"Obese (class I)", "Obese (class II)", "Obese (class III)")
bmi.gp <- cut(BMI, breaks=c(0, class_range, Inf), labels=bmi.labels)
bmi.cols <- c("red", "orange","pink","green","blue","pink","orange","red")
A (base R)
par(las=1, mar=c(4,10,1,1))
barplot(table(bmi.gp), col=bmi.cols, xlab="Frequency", horiz = TRUE, xlim=c(0,12))
mtext("BMI group", at=-3)
grid(ny = NA)
B (ggplot)
library(ggplot2)
data.frame(bmi.gp) |>
mutate(bmi.gp=factor(bmi.gp, levels=bmi.labels)) |>
count(bmi.gp, .drop=FALSE) |>
ggplot(aes(x=bmi.gp, y=n, fill=bmi.gp)) +
geom_col(show.legend=FALSE) +
scale_fill_manual(values=bmi.cols) +
theme_light() +
labs(y="Frequency", x="BMI group") +
theme(axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1))
Upvotes: 0