Miya
Miya

Reputation: 15

Disply only the top outliers in R

Is it possible to display the data above the boxplot, which is the outliers alone?

Code :

charges <- read.csv("tempcharges.csv")
data = read.csv("discharges.csv")

#MERGING THE TWO DATA FRAMES :

cdata <-  merge.data.frame(data,charges,by.x = "Enc",by.y = "Enc")

#TRANSFORMING VARIABLES
aaa <- mdy(cdata$discharge_date)
dates <- format(aaa,"%b%y")
charge <- cdata$TotalCharge
e <- (cdata$Enc)

#PLOT
plots <- ggplot(cdata, aes(x=aaa,y=charge,group=month(aaa)))+ 
scale_x_date(labels = function(z) format(z, format = "%b%y"))+
geom_boxplot(notch=TRUE,na.rm=TRUE)+
labs(title="INPATIENT CHARGE DATA TREND",
x="Period Data", 
y="Charges") + ylim(0,60000)
plots

enter image description here

Upvotes: 0

Views: 110

Answers (1)

pdw
pdw

Reputation: 363

You can isolate the outliers (i.e., > or < 1.5 IQR) and plot only those. For example:

library(dplyr)
library(magrittr)
library(ggplot2)
library(ggrepel)

dat <- data.frame(row=seq(1:100), value=rnorm(100))

iqr <- IQR(dat$value)

outliers <- dat %>% filter(value > 1.5*iqr | value < -1.5*iqr)

ggplot(outliers, aes(x=0, y=value)) +
  geom_point() +
  geom_text(aes(label=row, hjust=-2), cex=3)

Upvotes: 1

Related Questions