Reputation: 1023
I'm trying to write something that will take a data frame and plot every numeric column, with simple plots for non-normal distributions and control charts for normal ones:
library(plyr)
library(qcc)
library(ggplot2)
#generate data frame
data <- data.frame(seq_len(10),LETTERS[seq_len(10)],rnorm(10,5,3),rep(1,10),rep(2,10),rnorm(10,3,1),runif(10))
##checks heterogeneity
has_range <- function(data) { if(all( abs(data - mean(data)) == 0)) FALSE else TRUE}
##test for normality
normtest <- function(data) {if(has_range(data) == FALSE) FALSE else {
if(shapiro.test(data)$p.value < 0.05) FALSE else TRUE}}
##Control charts for Normal data, simple plots otherwise
drawplot<-function(data, ref=NULL) {
Sys.sleep(.1)
print(names(data))
if(normtest(data) == FALSE) {
plot(x=ref, y=data, ylab=names(data))
} else {
qcc(data,type="xbar.one", labels=ref, ylab=names(data))
}
}
## Apply drawplot to all numeric columns in data frame
colwise(drawplot, is.numeric, ref=data[[2]])(data)
The problem is that every apply family function seems to strip column names and I can't use the column names to label the plots:
print(names(data))
Gives NULL results.
Also there's a seemingly unrelated error that is cropping up:
Error: length(rows) == 1 is not TRUE
Upvotes: 2
Views: 1239
Reputation: 115382
You need to create a function that uses the names, otherwise the names won't be accessible within the function.
You can't pass x = NULL
to plot
, so I've rewritten a bit of your function
(qcc
was kicking up a fuss with an atomic vector for x
as well)
Something like
drawplot<-function(n, data, ref=NULL) {
Sys.sleep(.1)
print(n)
if(normtest(data[[n]]) == FALSE) {
if(is.null(ref)){ref <- seq_along(data[[n]])}
plot(x=ref, y=data[[n]], ylab=n)
} else {
qcc(data[,n, drop=FALSE], type="xbar.one", labels=ref,ylab = n)
}
}
lapply(names(Filter(is.numeric,dd)), drawplot, data = dd)
Note that this function would work with position indexing as well (but the labels wouldn't be as pretty)
Upvotes: 2