Chris Ruehlemann
Chris Ruehlemann

Reputation: 21400

How to order boxes in boxplots by the medians of a numerical variable in a dataframe in base R

I have a dataframe with three variables; one ("group") is a factor with two levels, one ("word") is a character vector, and one ("duration") is numeric. For example:

DATA <- data.frame(
  group = c(rep("prefinal",10), rep("final", 10)),
  word  = c(sample(LETTERS[1:5], 10, replace = T), sample(LETTERS[1:5], 10, replace = T)),
  duration   = rnorm(20)
)
DATA
      group word    duration
1  prefinal    C  0.16378771
2  prefinal    E  0.13370196
3  prefinal    A  0.69112398
4  prefinal    B  0.21499187
5  prefinal    D -0.28998279
6  prefinal    D -2.00353522
7  prefinal    A  0.37842555
8  prefinal    E  1.62326170
9  prefinal    A -0.26294929
10 prefinal    B -0.54276322
11    final    D  1.32772171
12    final    E -1.84902285
13    final    C  0.01058158
14    final    E  1.49529743
15    final    B  0.55291290
16    final    A -0.35484820
17    final    D -0.16822110
18    final    A  0.88667458
19    final    E  0.70889916
20    final    B  1.12217332

I'd like to depict the durations of the words by group in boxplots:

boxplot(DATA$duration ~ DATA$group + DATA$word, 
        xaxt="n",
        col = rep(c("blue", "red"), 5))
axis(1, at = seq(from=1.5, to= 10.5, by=2), labels = sort(unique(DATA$word)), cex.axis = 0.9)

R seems to order the boxes in alphabetical order (of the "word" variable) by default.

EDIT:

However I'd prefer that the boxes be sorted by the median durations (in descending order) the items in the "word" variable have in the "prefinal" group. How can that be achieved? enter image description here

Upvotes: 1

Views: 409

Answers (1)

Humpelstielzchen
Humpelstielzchen

Reputation: 6441

You can reorder the levels of DATA$wordaccording to their median. The - before DATA$duration is to sort it in descending order.

DATA$word <-  reorder(DATA$word, -DATA$duration, FUN = median)

boxplot(DATA$duration ~ DATA$group + DATA$word, 
        xaxt="n",
        col = rep(c("blue", "red"), 5))
axis(1, at = seq(from=1.5, to= 10.5, by=2), labels = levels(DATA$word), cex.axis = 0.9)

enter image description here

You can do the same for the subgroup of prefinal. But it requires an additional step:

ordered_levels <- levels(with(DATA[DATA$group == "prefinal",], reorder(word, -duration, FUN = median)))

DATA$word <- factor(DATA$word, levels = ordered_levels)

enter image description here

Upvotes: 1

Related Questions