Marcel
Marcel

Reputation: 313

Different class intervalls (breaks) according to specific values in the histogram

I would like to create class intervals in the histogram according to the Body mass index (BMI) classification, and color the columns. The categories are:

Underweight (Severe thinness) < 16.0 -> color: red

Underweight (Moderate thinness) 16.0 – 16.9 -> color: orange

Underweight (Mild thinness) 17.0 – 18.4 -> color: pink

Normal range 18.5 – 24.9 -> color: green

Overweight (Pre-obese) 25.0 – 29.9 -> color: blue

Obese (Class I) 30.0 – 34.9 -> color: pink

Obese (Class II) 35.0 – 39.9 -> color: orange

Obese (Class III) ≥ 40.0 -> color: red

I tried the code below, but it returned the density (y-axis) and the x-axis does not contain the class ranges properly. How to plot the frequency in y-axis and the class intervals limits in histogram?

Height <- c(1.72, 1.86, 2.1, 1.7, 1.6, 1.67, 1.59, 1.88, 1.7, 1.72, 1.9, 1.88,
            1.59, 1.55, 1.91, 1.61, 1.82, 1.66, 1.77, 1.74)
Weight  <- c(77, 79, 102, 70, 63, 62, 55, 89, 88, 88, 128, 100, 55, 60, 79, 59,
             57, 70, 72, 74)
BMI <- Weight/Height^2
class_range <- c(16, 16.9, 18.4, 24.9, 29.9, 34.9, 39.9)
hist(BMI, freq=TRUE, main="", breaks=class_range, 
     col=c("red",  "orange", "pink", "green", "blue", "pink", "orange", "red"))

enter image description here

Upvotes: 0

Views: 63

Answers (2)

jay.sf
jay.sf

Reputation: 72583

Probably you are looking for a bar chart, which is often confused with a histogram.

First, to get the correct class_ranges, you can cut BMI along the lower bounds, and add 0 and Inf. Then cut BMI along class_range and barplot the table.

> cut(BMI, breaks=class_range) |> table() |> 
+   barplot(col=c("red",  "orange", "pink", "green", "blue", "pink", "orange", "red"))

enter image description here

If you use a named classes vector, i.e. where the breaks are named,

> classes <- c('nul'=0, 
+              'Underweight\n(Severe thinness)\n<16.0'=16,
+              'Underweight\n(Moderate thinness)\n16.0 – 16.9'=17,
+              'Underweight\n(Mild thinness)\n17.0 – 18.4'=18.5,
+              'Normal range\n18.5 – 24.9'=25,
+              'Overweight\n(Pre-obese)\n25.0 – 29.9'=30,
+              'Obese\n(Class I)\n30.0 – 34.9'=35,
+              'Obese\n(Class II)\n35.0 – 39.9'=40,
+              'Obese\n(Class III)\n≥ 40.0'=Inf)

you can use the labels= argument of cut and make it look a little more sophisticated. (We use the padj. parameter to shift the names of the bars a little down, which throws a warning not sure why.) To illustrate this, I use other simulated data below.

> cut(bmi, breaks=classes, labels=names(classes)[-1]) |> table() |> 
+   barplot(col=c("red",  "orange", "pink", "green", "blue", "pink",
+                 "orange", "red"), border=col, padj=.5, cex.names=.8, 
+           ylab='Frequency') + 
+   mtext('BMI', 1, 3.5)

enter image description here

Alternatively, explicitly state freq=TRUE in histogram, which might indicate why freq per default is deactivated in such cases.

> hist(bmi, freq=TRUE, main="", breaks=class_range, 
+      col=c("red",  "orange", "pink", "green", "blue", "pink", "orange", "red"))
Warning message:
In plot.histogram(r, freq = freq1, col = col, border = border, angle = angle,  :
  the AREAS in the plot are wrong -- rather use 'freq = FALSE'

enter image description here


Data:

set.seed(42)
bmi <- rgamma(1e3, 38, 1.42)

Upvotes: 0

Edward
Edward

Reputation: 18493

By categorizing the BMI values into groups, you are effectively creating a categorical variable. Histograms are not the best plot to visualise this type of data. Try a barchart, as shown blow using base R and the ggplot2 package.

bmi.labels <- c("Underweight (severe)", "Underweight (moderate)", "Underweight (mild) ",
                "Normal",
                "Overweight",
                "Obese (class I)", "Obese (class II)", "Obese (class III)")

bmi.gp <- cut(BMI, breaks=c(0, class_range, Inf), labels=bmi.labels)

bmi.cols <- c("red", "orange","pink","green","blue","pink","orange","red")

A (base R)

par(las=1, mar=c(4,10,1,1))
barplot(table(bmi.gp), col=bmi.cols, xlab="Frequency", horiz = TRUE, xlim=c(0,12))
mtext("BMI group", at=-3)
grid(ny = NA)

B (ggplot)

library(ggplot2)

data.frame(bmi.gp) |>
  mutate(bmi.gp=factor(bmi.gp, levels=bmi.labels)) |>
  count(bmi.gp, .drop=FALSE) |>
  ggplot(aes(x=bmi.gp, y=n, fill=bmi.gp)) +
  geom_col(show.legend=FALSE) +
  scale_fill_manual(values=bmi.cols) +
  theme_light() +
  labs(y="Frequency", x="BMI group") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1))

enter image description here

Upvotes: 0

Related Questions