Reputation: 555
I would like to build a function that works similarly to the built-in pairs()
function. Given the data set df
, I would like to create a matrix of scatter plots between the numeric class columns of the data set, such that on the diagonal I can get the histogram of those columns including a density function. I have precisely an issue in integrating the diagonal part of the plots. Without the diagonal part of the code, everything works fine (one can test this on the iris
data set).
new_pairs<-function(df, x){
par(mar=c(1,1,1,1))
n_col<-sum(sapply(df, is.numeric))
par(mfrow=c(n_col,n_col))
n<-ncol(df)
for (i in 1:n){
for (j in 1:n){
if ((class(df[,i])!="factor" ) & (class(df[,j])!="factor") & i!=j)
{plot(df[,i], df[,j], col = df[,x])}
else if ((class(df[,i])!="factor") & (class(df[,j])!="factor") & i==j)
{hist(df[,i], breaks=10, probability=T, main=NULL)} {lines(density(df[,i]))}
}
}
}
new_pairs(df,2)
It seems that including the line {lines(density(df[,i]))}
is not permissible. I get an error message. I therefore tried to build a density function, which will account for missing values. It works fine on a given (numeric class) column but I do not know how to integrate it inside the new_pairs()
function. Here is the density function:
hist_density = function (df[,3]) {
N = length(df[,3])
df[,3] <- na.omit(df[,3])
hist( df[,3], col = "light blue",
probability = TRUE, main=NULL)
lines(density(df[,3]), col = "red", lwd = 3)
}
Upvotes: 1
Views: 56
Reputation: 886988
May be we can use the lines
in each of the options
new_pairs<-function(df, x){
par(mar=c(1,1,1,1))
n_col<-sum(sapply(df, is.numeric))
par(mfrow=c(n_col,n_col))
n<-ncol(df)
for (i in 1:n){
for (j in 1:n){
if ((class(df[,i])!="factor" ) & (class(df[,j])!="factor") & i!=j) {
plot(df[,i], df[,j], col = df[,x])
lines(density(na.omit(df[, i])))
}else if ((class(df[,i])!="factor") & (class(df[,j])!="factor") & i==j) {
hist(df[,i], breaks=10, probability=T, main=NULL)
lines(density(na.omit(df[,i])))
} else{
NA
}
}
}
}
new_pairs(iris, 3)
The hist
part can be called from a different function created
hist_density <- function (df, i) {
N <- length(df[,i])
tmp <- na.omit(df[,i])
hist(tmp, col = "light blue",
probability = TRUE, main=NULL)
lines(density(tmp), col = "red", lwd = 3)
}
new_pairs<-function(df, x){
par(mar=c(1,1,1,1))
n_col<-sum(sapply(df, is.numeric))
par(mfrow=c(n_col,n_col))
n<-ncol(df)
for (i in 1:n){
for (j in 1:n){
if ((class(df[,i])!="factor" ) & (class(df[,j])!="factor") & i!=j) {
plot(df[,i], df[,j], col = df[,x])
lines(density(na.omit(df[, i])))
}else if ((class(df[,i])!="factor") & (class(df[,j])!="factor") & i==j) {
hist_density(df, i)
} else {
NA}
}
}
}
new_pairs(iris, 3)
Upvotes: 2
Reputation: 8572
With the danger of being superflous, but without inventing a new method you could use pairs
itself and simply use the diag.panel
argument to add the density and histogram yourself. The code below is taken from an example in help(pairs)
(where I've added the density)
This might be a cleaner solution
# Add histograms + density (taken from help("pairs"))
panel.hist <- function(x, ...)
{
usr <- par("usr"); on.exit(par(usr))
par(usr = c(usr[1:2], 0, 1.5) )
h <- hist(x, plot = FALSE)
breaks <- h$breaks; nB <- length(breaks)
y <- h$counts; y <- y/max(y)
rect(breaks[-nB], 0, breaks[-1], y, col = "cyan", ...)
# density
dens <- density(x); dens$y <- dens$y / max(dens$y)
lines(dens)
}
pairs(iris[, 1:4], diag.panel = panel.hist)
Upvotes: 2