Micrified
Micrified

Reputation: 3640

Multiple boxplots in R from unlabelled matrix?

Problem

I've got an R matrix with some data produced by a computer program. I've configured the data to be imported into R as a matrix. There are an even number of columns, with column (2*i, 2*i+1) being two variables measured under condition i. I've made a visualisation of this below, and how I am trying to produce a box-plot:

enter image description here


Attempts

Unfortunately, the columns don't have any labels, or anything like that, and I'm not sure how to get multiple boxplots in the case where I have two columns representing the different labels in this format.

I've tried to adapt this excellent question to work, but given his columns are effectively a combined version of the (A,B) pairs you see in my diagram with a label column, I'm not sure how to re-work it for my case.


Here's what I've got so far, but the grouping isn't there and nor are the categories:

enter image description here

Since it's useful to have the actual data, I've posted a link to my data here.

Upvotes: 2

Views: 156

Answers (2)

jay.sf
jay.sf

Reputation: 72683

You may subset the data along a vector of conditions.

(cond <- rep(LETTERS[1:2], ncol(d)/2))
# [1] "A" "B" "A" "B" "A" "B" "A" "B" "A" "B" "A" "B" "A" "B" "A" "B" "A" "B"

boxplot(d, boxfill=NA, border=NA, xaxt="n", xlim=c(0, 17.75),   ## initialize plot
        xlab="index", ylab="value", main="My plot")
boxplot(d[cond == "A"], xaxt="n", add=TRUE, boxfill=2,  ## subset A
        boxwex=0.35, at=which(cond == "A") - .25)
boxplot(d[cond == "B"], xaxt="n", add=TRUE, boxfill=4,  ## subset B
        boxwex=0.35, at=which(cond == "A") + .25)
## axis
axis(1, seq(ncol(d))[(seq(ncol(d)) + 1) %% 2 == 0], labels=1:(ncol(d)/2))
## optional legend
legend("topleft", leg=cond[1:2], pch=22, pt.bg=c(2, 4), col=1, bty="n")

enter image description here


Data:

d <- read.csv("https://gist.githubusercontent.com/Micrified/4bb8c392300998e99320bf5ec3ba3d01/raw/765baf87f8fe40ccd58c145d49a3c21ee6009de5/data.csv")

Upvotes: 1

hdkrgr
hdkrgr

Reputation: 1736

You will need to convert your data from a matrix to a data frame and extract the information about the groups (i) and first/second column within each group somehow.

A possible solution:

library(tidyverse) # we'll use dplyr, ggplot2 and purrr
i = 3
n_cols_per_i = 2
mat <- matrix(1:(i*n_cols_per_i*9), ncol=n_cols_per_i * i)
# 3*2 columns of 9 values each

name_fn <- function(group, col){
  paste0('group_', group, '_col_', col)
}

colnames(mat) <- map2_chr(rep(1:i,n_cols_per_i), rep(c("A", "B"), i), name_fn)

df <- as_tibble(mat)

df <- df %>% pivot_longer(
  cols=everything(),
  names_to = c("group", "col"),
  names_pattern = "group_(.)_col_(.)"
  )

df %>% ggplot(aes(y=value, x=group, fill=col)) +
  geom_boxplot()

enter image description here

The df will have such a structure that you can also apply the other plots from the linked question analogously.

Upvotes: 1

Related Questions