Mohammad Farsadnia
Mohammad Farsadnia

Reputation: 31

how to order the X-axis in a box plot or QQ plot in R?

here is a reproducible sample:

library(ggpubr)
library(rstatix)
library(tibble)

da.ma <-matrix(1:22000, 10, 22) ## a sample matrix

n <-seq(max(length(da.ma[1,]))) ## naming cols and rows
for (i in n) {
    c.names <- paste("k", n, sep = "")
}
colnames(da.ma) <- c.names 

n.pdf <-seq(length(da.ma[,1]))
for (i in n.pdf) {
    r.names <- paste("text",n.pdf, sep ="")
}
rownames(da.ma) <- r.names
col.names <-names(da.ma[1, ])

da.ma <-cbind(id = seq(length(da.ma[, 1])), da.ma) ##adding the id col
data <- as_tibble(da.ma)

in.anova <- data %>%
  gather(key = "Length", value = "TTR", colnames(data[, 2:23])) %>%
  convert_as_factor(id, Length)

Up to here, you create the data, but when you draw the plot, the X-axis is not in the right order:

ggboxplot(in.anova, x = "Length", y = "TTR", add = "point")

I need it to start from k1 and go up to k24. However, it starts from k1 and continues with k10, k11, k12, etc. The right order on the X-axis would be: k1, k2, k3, k4, ..., k23, and k24.

Upvotes: 0

Views: 498

Answers (3)

utubun
utubun

Reputation: 4520

Your X-axis is in order, however it is in alphabethical order. If you run in your console 'k2' > 'k11' you will see what I mean.

Next, to your reproducible example.

Sample matrix

  • I would avoid dot notation because such names looks like a base functions, and it is confusing;
  • I would advise to use space between variable name and assignment operator - it is more readable;
  • As your data argument you provide a vector of length 22000 (1:22000). At the same time you tell to the matrix() that you want a matrix with 10 rows and 22 columns. Since 10 x 22 = 220 only first 220 of 22000 values will be used, the rest will be ignored;
  • You can use set.seed() and e.g. sample() functions to generate random data;

Finally your sample matrix generation would look like this:

set.seed(67600941)

mtrx <- matrix(sample(220), 10)

Rownames, colnames

  • In most of the cases, you don't need to loop in R. Most of the functions in R are vectorized;
  • You don't need to save the names into separate variable, you can assign them directly;
  • For columns being in order, I would use a leading zeros, which is easy to achieve with sprintf() function;
  • You do not use row names further, but I'll leave it as it is;

Final code:

rownames(mtrx) <- sprintf('text%02d', seq(nrow(mtrx)))
colnames(mtrx) <- sprintf('k%02d',    seq(ncol(mtrx)))

mtrx[1:5, 1:5]

#        k01 k02 k03 k04 k05
# text01 206 127   9   1 138
# text02 191  46 220  59  73
# text03 145  15 148 213 103
# text04  80 115 211  62  79
# text05  28  11 195 136  84

Data preprocessing and plotting

  • You can use rowid_to_column() from tibble to create your id column during preprocessing;
  • I would use pivot_longer() intead of gather() since the gather() is depricated;
  • You don't need to care about the levels, since leading zeros put your Length in right alphabetical order;
  • I didn't save the transformed data into intermediate variable, just to save the space.
library(tidyverse)

mtrx %>%
    as_tibble() %>%
    rowid_to_column('id') %>%
    pivot_longer(-id, names_to = 'Length', values_to = 'TTR') %>%
    mutate(Length = factor(Length)) %>%
    ggplot(aes(x = Length, y = TTR)) +
      geom_jitter() +
      geom_boxplot(fill = NA) +
      ggthemes::theme_few()

enter image description here

Upvotes: 0

dario
dario

Reputation: 6485

in.anova$Length <- factor(in.anova$Length, levels = paste0("k", 1:22))  
ggboxplot(in.anova, x = "Length", y = "TTR", add = "point")

Returns:

enter image description here

Upvotes: 2

Beril Boga
Beril Boga

Reputation: 97

You can use factor() function with predefined order as levels for Length column.

library(rstatix)
library(ggpubr)
da.ma <-matrix(1:22000, 10, 22) ## a sample matrix

n <-seq(max(length(da.ma[1,]))) ## naming cols and rows
for (i in n) {
    c.names <- paste("k", n, sep = "")
}
colnames(da.ma) <- c.names 

n.pdf <-seq(length(da.ma[,1]))
for (i in n.pdf) {
    r.names <- paste("text",n.pdf, sep ="")
}
rownames(da.ma) <- r.names
col.names <-names(da.ma[1,])

da.ma <-cbind(id =seq(length(da.ma[,1])), da.ma) ##adding the id col
library(tibble)
data <- as_tibble(da.ma)

in.anova <- data %>%
  gather(key = "Length", value = "TTR", colnames(data[,2:23])) %>%
  convert_as_factor(id, Length)
 

#get unique length values             
levels = unique(in.anova$Length)

#order last two digits
levels = levels[order(as.numeric(substr(levels,2,4)))]

#change length column type as factor with predefined order previously
in.anova$Length = factor(in.anova$Length,levels=levels )

ggboxplot(in.anova, x = "Length", y = "TTR", add = "point")

Upvotes: 0

Related Questions