Reputation: 2253
I have this norm_f()
function with help from this thread. The function produces two plots to assess the normal distribution:
> norm_f(iris$Sepal.Length)
The function is written with:
norm_f <- function(z, hist. = list(col = "lightgray"),
lines. = list(lwd = 2), curve. = list(col = "red", lwd = 2),
qqnorm. = list(col = "black", pch = 1, cex = 1.2), qqline. = list(lwd = 2)) {
zname <- deparse(substitute(z))
zden <- density(z)
if (!"ylim" %in% names(hist.)) {
ymax <- max(zden$y, dnorm(c(seq(min(z), max(z), len = 21)), mean = mean(z), sd = sd(z)))
hist.$ylim <- c(0, ymax)
}
par(mfrow=c(1,2), mar=c(5,5,2,2))
if (!"main" %in% names(hist.)) hist.$main <- paste("Histogram of", zname)
if (!"xlab" %in% names(hist.)) hist.$xlab <- zname
do.call(hist, c(list(z, prob = TRUE), hist.))
do.call(lines, c(list(zden), lines.))
do.call(curve, c(list(substitute(dnorm(x, mean=mean(z), sd=sd(z))),
add = TRUE), curve.))
do.call(qqnorm, c(list(z), qqnorm.))
do.call(qqline, c(list(z), qqline.))
}
I want to update the function, so that it produces 2x2 plots: two without log2-transformation (as above) and two plots with log2-tansformation of the covariate (i.e., corresponding to norm_f(log2(iris$Sepal.Length))
)
Expected output:
Upvotes: 0
Views: 321
Reputation: 1595
I modified the original function norm_f()
to work with ggplot2 and patchwork. Now the function takes three arguments: df
which is your data.frame, var
which is a character representing your target variable (e.g., Sepal.Length
) and thefun
which calls the scaling function you want without quotes. The plots are rendered through ggplot2 and their layout is organized via patchwork.
Please, have a look at the code below to check if it fits your needs. I run two examples, one with the requested log2()
function and the second one with exp()
.
library(ggplot2)
library(patchwork)
data(iris)
norm_f <- function( df, var, thefun, ... ) {
# this gets the name of the function invoked
fun_name = as.character(match.call()[4])
# this matches the function in `thefun` so that you can use it
thefun = match.fun(thefun)
# compute the log2
va2_log2 <- thefun(df[ , var ])
# specify we want the density for the histogram
y <- "..density.."
# Histogram of "var"
p_hist <- ggplot( df, aes_string(x = var) ) +
geom_histogram(aes_string( y = y ) ) +
geom_density() +
stat_function(fun = dnorm,
args = list(mean = mean(df[ , var ], na.rm = TRUE),
sd = sd(df[ , var ], na.rm = TRUE)),
colour = 'blue') +
ggtitle(paste0("Histogram of ", var) )
# QQ-plot of "var"
p_qq <- ggplot( df, aes_string(sample = var) ) +
stat_qq() + stat_qq_line() +
ggtitle(paste0("QQ-Plot of ", var) )
# Histogram of log2("var")
p_hist_log <- ggplot( df, aes_string(x = va2_log2) ) +
geom_histogram(aes_string( y = y ) ) +
geom_density() +
stat_function(fun = dnorm,
args = list(mean = mean(va2_log2, na.rm = TRUE),
sd = sd(va2_log2, na.rm = TRUE)),
colour = 'blue') +
ggtitle( paste0("Histogram of ", fun_name, "(", var, ")" ) )
# QQ-plot of log2("var")
p_qq_log <- ggplot( df, aes_string(sample = va2_log2) ) +
stat_qq() + stat_qq_line() +
ggtitle( paste0("QQ-Plot of ", fun_name, "(", var, ")" ) )
print((p_hist + p_qq) / (p_hist_log + p_qq_log))
}
norm_f(df = iris, var = "Sepal.Length", thefun = log2 )
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
norm_f(df = iris, var = "Sepal.Length", thefun = exp )
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Created on 2021-03-01 by the reprex package (v1.0.0)
Upvotes: 1