Reputation: 449
I have a dataframe df. While plotting this in ggplot. Can we also highlight outliers. Below is the sample code
df <- data.frame(col=runif(100, min=0, max=100000))
df$D <- c(1:100)
ggplot(df,aes(x=D,y=col))+geom_line()
Is there the way to highlight outliers here
Upvotes: 1
Views: 1349
Reputation: 39154
We can define a function for this. The line_outlier_plot
has four arguments. df
has the same format as your example data frame. outlier_color
and normal_color
are to specify the color for the points.drop
indicates if we want to drop the category in the legend.
We have to define how to determine an outlier. Here, I decided that an outlier is a value larger or smaller than the mean plus or minus 3 times of the standard deviation. You can define your own approach to determine the outlier by modifying the code in the ifelse
statement.
library(ggplot2)
line_outlier_plot <- function(df, outlier_color = "red", normal_color = "black", drop = FALSE){
# Assign a label to show if it is an outlier or not
df$label <- ifelse(df$col > mean(df$col) + 3 * sd(df$col) |
df$col < mean(df$col) - 3 * sd(df$col), "Outlier", "Normal")
df$label <- factor(df$label, levels = c("Normal", "Outlier"))
# Set the color palette
pal <- c("Outlier" = outlier_color, "Normal" = normal_color)
p <- ggplot(df, aes(x = D, y = col)) +
geom_line() +
geom_point(aes(color = label)) +
scale_color_manual(values = pal, drop = drop)
return(p)
}
Below is an example of the plot using this function.
set.seed(155)
df <- data.frame(col=rnorm(1000))
df$D <- c(1:1000)
line_outlier_plot(df)
Upvotes: 1