Illimar Rekand
Illimar Rekand

Reputation: 103

Plotting only quantiles in a ggarrangeplot

I have a plot where I am comparing several (around 12) unrelated descriptors. To facilitate the display of all these plots, I make a list:

library(facetscales)
library(ggplot2)

comb <- lapply(colnames(iris[1:4]), function(x) ggplot(iris, aes(x = get(x))) + 
                 geom_histogram(position = "identity", aes(y= ..ncount.., fill = Species), bins = 10) +
                 theme_classic() + 
                 facet_grid(Species~., scales ="free_y") +
                 theme(legend.position = 'None',

                       panel.spacing = unit(2, "lines"),
                       legend.title = element_blank(),
                       strip.background = element_blank(),
                       strip.text.y = element_blank(),
                       plot.margin = unit(c(10,10,10,10), "points")
                 )+
                 xlab(x) +
                 scale_x_continuous() 
)

which I use with the ggarrange function

ggarrange(plotlist = comb, common.legend = TRUE, legend = "bottom", ncol = 2, nrow = 2) 

to create a plot which suits my needs:

ggarranges plots

However, some of my data have some extreme outliers. I am therefore in need of creating plots which displays 90% quantile data of each column in my dataframe.

I would like to implement a solution which would be similar to the one presented by Warner in this question: (show only 0-90% or 0-95% percentile) , but I am unable to properly implement this solution with what I have. What I am looking for is a way to apply the information obtained from the line:

quantiles <- lapply(iris, quantile, c(0, 0.9)) # find 90% quantiles for all columns

so that only the 90th percentile data is displayed in the lapply function above.

Upvotes: 1

Views: 75

Answers (1)

Andrew Chisholm
Andrew Chisholm

Reputation: 6567

I think you want to remove data above the 90th percentile and plot what remains. Here's some code to do this. I moved the code to a separate function to make it easier to debug and a made the quantile value a parameter to make it easy to change. I also used aes_string in the ggplot call instead of needing to use get.

library(facetscales)
library(ggplot2)
library(ggpubr)

myplot <- function(x, q) {
    data <- iris %>% dplyr::select(x)   # Select the column of interest
    quantiles <- quantile(data[,1], q)  # Calculate the required quantile
    filtered_data <- iris %>% dplyr::filter(.data[[x]] < quantiles[1]) # Filter the column with the required quantile
    ggplot(filtered_data, aes_string(x = x)) +
        geom_histogram(position = "identity", aes(y= ..ncount.., fill = Species), bins = 10) +
        theme_classic() + 
        facet_grid(Species~., scales ="free_y") +
        theme(legend.position = 'None',
                    
                    panel.spacing = unit(2, "lines"),
                    legend.title = element_blank(),
                    strip.background = element_blank(),
                    strip.text.y = element_blank(),
                    plot.margin = unit(c(10,10,10,10), "points")
        ) +
        xlab(x) +
        scale_x_continuous() 
}
comb <- lapply(colnames(iris[1:4]), function(x) myplot(x, 0.9))
ggarrange(plotlist = comb, common.legend = TRUE, legend = "bottom", ncol = 2, nrow = 2) 

enter image description here

Upvotes: 1

Related Questions