Sooji
Sooji

Reputation: 169

Plotting distributions of all columns in an R data frame

I'm trying to come up with a clean way to plot a grid view of all the columns in an R data frame. The problem is my dataframe has both discrete and numeric values in it. For simplicity's sake, we can use the sample dataset provided by R called iris. I would use par(mfrow(x, y)) to split my plots and maybe an mapply to cycle through each column? I'm unsure what's best here.

I'm thinking something akin to:

ggplot(iris, aes(Sepal.Length))+geom_density()

But instead plotted for each column. My concern is the "Species" column being discrete. Maybe "geom_density" wouldn't be the right plot to use here, but the idea is to see each of the data frame's variables distributions in one plot-- even the discrete ones. Bar plots for the discrete values would serve the purpose. Basically I'm trying to do the following:

Any thoughts or advice would be appreciated!

Upvotes: 3

Views: 10647

Answers (1)

Juan Bosco
Juan Bosco

Reputation: 1430

You can use the function plot_grid from the cowplot package. This function takes a list of plots generated by ggplot and created a new plot, cobining them in a grid.

First, create a list of plots with lapply, using geom_density for numeric variables and geom_bar for everything else.

my_plots <- lapply(names(iris), function(var_x){
  p <- 
    ggplot(iris) +
    aes_string(var_x)

  if(is.numeric(iris[[var_x]])) {
    p <- p + geom_density()

  } else {
    p <- p + geom_bar()
  } 

})

Now we simply call plot_grid.

plot_grid(plotlist = my_plots)

Upvotes: 8

Related Questions