cmirian
cmirian

Reputation: 2253

How to incorporate visualization of log2-transformation in function?

I have this norm_f() function with help from this thread. The function produces two plots to assess the normal distribution:

> norm_f(iris$Sepal.Length)

enter image description here

The function is written with:

norm_f <- function(z, hist. = list(col = "lightgray"),
                   lines. = list(lwd = 2), curve. = list(col = "red", lwd = 2),
                   qqnorm. = list(col = "black", pch = 1, cex = 1.2), qqline. = list(lwd = 2)) {
  zname <- deparse(substitute(z))
  zden <- density(z)
  if (!"ylim" %in% names(hist.)) {
    ymax <- max(zden$y, dnorm(c(seq(min(z), max(z), len = 21)), mean = mean(z), sd = sd(z)))
    hist.$ylim <- c(0, ymax)
  }
  par(mfrow=c(1,2), mar=c(5,5,2,2))
  if (!"main" %in% names(hist.)) hist.$main <- paste("Histogram of", zname)
  if (!"xlab" %in% names(hist.)) hist.$xlab <- zname
  do.call(hist, c(list(z,  prob = TRUE), hist.))
  do.call(lines, c(list(zden), lines.))
  do.call(curve, c(list(substitute(dnorm(x, mean=mean(z), sd=sd(z))),
                        add = TRUE), curve.))
  do.call(qqnorm, c(list(z), qqnorm.))
  do.call(qqline, c(list(z), qqline.))
  
}

I want to update the function, so that it produces 2x2 plots: two without log2-transformation (as above) and two plots with log2-tansformation of the covariate (i.e., corresponding to norm_f(log2(iris$Sepal.Length)))

Expected output:

enter image description here

Upvotes: 0

Views: 321

Answers (1)

Francesco Grossetti
Francesco Grossetti

Reputation: 1595

I modified the original function norm_f() to work with ggplot2 and patchwork. Now the function takes three arguments: df which is your data.frame, var which is a character representing your target variable (e.g., Sepal.Length) and thefun which calls the scaling function you want without quotes. The plots are rendered through ggplot2 and their layout is organized via patchwork.

Please, have a look at the code below to check if it fits your needs. I run two examples, one with the requested log2() function and the second one with exp().

library(ggplot2)
library(patchwork)

data(iris)

norm_f <- function( df, var, thefun, ... ) {
  
  # this gets the name of the function invoked
  fun_name = as.character(match.call()[4]) 
  # this matches the function in `thefun` so that you can use it
  thefun = match.fun(thefun)
  
  # compute the log2
  va2_log2 <- thefun(df[ , var ])
  
  # specify we want the density for the histogram
  y <- "..density.."
  # Histogram of "var"
  p_hist <- ggplot( df, aes_string(x = var) ) +
    geom_histogram(aes_string( y = y ) ) + 
    geom_density() +
    stat_function(fun = dnorm, 
                  args = list(mean = mean(df[ , var ], na.rm = TRUE), 
                              sd = sd(df[ , var ], na.rm = TRUE)), 
                  colour = 'blue') + 
    ggtitle(paste0("Histogram of ", var) )
  
  # QQ-plot of "var"
  p_qq <- ggplot( df, aes_string(sample = var) ) +
    stat_qq() + stat_qq_line() +
    ggtitle(paste0("QQ-Plot of ", var) )
  
  # Histogram of log2("var")
  p_hist_log <- ggplot( df, aes_string(x = va2_log2) ) +
    geom_histogram(aes_string( y = y ) ) + 
    geom_density() +
    stat_function(fun = dnorm, 
                  args = list(mean = mean(va2_log2, na.rm = TRUE), 
                              sd = sd(va2_log2, na.rm = TRUE)), 
                  colour = 'blue') +
    ggtitle( paste0("Histogram of ", fun_name, "(", var, ")" ) )
  
  # QQ-plot of log2("var")
  p_qq_log <- ggplot( df, aes_string(sample = va2_log2) ) +
    stat_qq() + stat_qq_line() +
    ggtitle( paste0("QQ-Plot of ", fun_name, "(", var, ")" ) )
  
  print((p_hist + p_qq) / (p_hist_log + p_qq_log))
  
}

norm_f(df = iris, var = "Sepal.Length", thefun = log2 )
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

norm_f(df = iris, var = "Sepal.Length", thefun = exp )
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Created on 2021-03-01 by the reprex package (v1.0.0)

Upvotes: 1

Related Questions