Dev P
Dev P

Reputation: 449

Automatic Highlighting Outliers in ggplots

I have a dataframe df. While plotting this in ggplot. Can we also highlight outliers. Below is the sample code

df <- data.frame(col=runif(100, min=0, max=100000))
df$D <- c(1:100)
ggplot(df,aes(x=D,y=col))+geom_line()

Is there the way to highlight outliers here

Upvotes: 1

Views: 1349

Answers (1)

www
www

Reputation: 39154

We can define a function for this. The line_outlier_plot has four arguments. df has the same format as your example data frame. outlier_color and normal_color are to specify the color for the points.drop indicates if we want to drop the category in the legend.

We have to define how to determine an outlier. Here, I decided that an outlier is a value larger or smaller than the mean plus or minus 3 times of the standard deviation. You can define your own approach to determine the outlier by modifying the code in the ifelse statement.

library(ggplot2)

line_outlier_plot <- function(df, outlier_color = "red", normal_color = "black", drop = FALSE){
  # Assign a label to show if it is an outlier or not
  df$label <- ifelse(df$col > mean(df$col) + 3 * sd(df$col) |
                     df$col < mean(df$col) - 3 * sd(df$col), "Outlier", "Normal")

  df$label <- factor(df$label, levels = c("Normal", "Outlier"))

  # Set the color palette
  pal <- c("Outlier" = outlier_color, "Normal" = normal_color)

  p <- ggplot(df, aes(x = D, y = col)) +
    geom_line() +
    geom_point(aes(color = label)) +
    scale_color_manual(values = pal, drop = drop)

  return(p)
}

Below is an example of the plot using this function.

set.seed(155)

df <- data.frame(col=rnorm(1000))
df$D <- c(1:1000)

line_outlier_plot(df)

enter image description here

Upvotes: 1

Related Questions