rustymarmot
rustymarmot

Reputation: 121

Multiple QQ Plots on Data Set with Unknown number and name of variables

I want to produce qqplots & lines on chem data that comes to me as an excel file with an unknown number of variables and an unknown number of observations then store each plot as a data object eg qqplot1, qqplot2, 3, 4, etc for later inclusion in a summary report. I'm writing a generic script to run on data sets as they come to me and the number and the name of the variables will vary.

  1. The first bit of script make a data frame (df) and this is pretty much how it looks to me after import from excel. This one has there variables (As, Ba, Cu) and the number of observations varies for each variable.
As = c(10, 20, 10, 12, 7, 14, 6, 9, 11, 15)
Ba = c(110, 120, 210, 112, 97, 214, 116, 211, 115, NA)
Cu = c(1, 1, 2, 11, 9, 21, 16, 19, NA, NA )
df = data.frame(As, Ba, Cu)

I can facet wrap all the variables by first pivoting then plotting. See code below. The pivot in R gives the columns generic names to the columns (name and value). This is Ok when there are only three variables but not so good if there are 20 or up to 50 variables.

Ideally, I would like to save each of the plots as objects that are sequentially numbered for later inclusion in summary HTML or PDF report.

Any ideas are welcome. RM

df_l = pivot_longer(df, cols = everything())

qqplot <- ggplot(data = df_l, mapping = aes(sample = value)) +
  stat_qq_band(alpha=0.5) +
  stat_qq_line() +
  stat_qq_point() +
  facet_wrap(~ name, scales = "free") +
  labs(x = "Theoretical Quantiles", y = "Sample Quantiles")
qqplot

PS: I was sort of going down this route but wanted it to be in ggplot and I have no idea how to save them sequentially numbered.

par(mfrow=c(1,1))
for (i in 1:ncol(df[,1: ncol(df) - 0 ])){  
  qqnorm(df[, i], main = names(df[i]))
  qqline(df[, i])
}

Upvotes: 1

Views: 1370

Answers (1)

stefan
stefan

Reputation: 124048

Maybe this is what you are looking for. Using e.g. purrr::imap (or lapply or ...) this could be achieved like so:

  1. Put your code for the qqplot inside a function

  2. Split you long df by name

  3. Use purrr::imap to loop over the splitted df

    • using imap has the advantage of passing the name of the split or the name of the variable to the function which makes it easy to add a title to the plot.
    • A second option to title your plots would be to keep the facet_wrap which will result in a facet like title for the plot

As a result you get a named list of qqplots:

As = c(10, 20, 10, 12, 7, 14, 6, 9, 11, 15)
Ba = c(110, 120, 210, 112, 97, 214, 116, 211, 115, NA)
Cu = c(1, 1, 2, 11, 9, 21, 16, 19, NA, NA )
df = data.frame(As, Ba, Cu)

library(ggplot2)
library(tidyr)
library(purrr)
library(qqplotr)

df_l = pivot_longer(df, cols = everything())

my_qqplot <- function(.data, .title) {
  ggplot(data = .data, mapping = aes(sample = value)) +
    stat_qq_band(alpha=0.5) +
    stat_qq_line() +
    stat_qq_point() +
    facet_wrap(~ name, scales = "free") +
    labs(x = "Theoretical Quantiles", y = "Sample Quantiles", title = .title)
}

qqplots <- df_l %>% 
  split(.$name) %>% 
  imap(my_qqplot)

qqplots$As # or qqplots[[1]]

Upvotes: 2

Related Questions