Reputation: 4477
I produced a faceted plot that I'm very satisfied with except for one issue. On a couple of the plots, one or two outliers completely ruin the graph. I could use y_lim function, but I'm using facet_grid(scales="free") so each plot has a unique limit. Here is my code and the graph it produced.
melted_df <- melt(df, id='ca_rmsd')
ggplot(melted_df,aes(ca_rmsd,value)) + geom_point()
+ facet_grid(varible ~.,scales="free")
(source: willisjr at structbio.vanderbilt.edu)
As you can see the top plot has a data point WAY outside the axis that smashes the rest.
Upvotes: 4
Views: 6698
Reputation: 3308
Here's a possibility:
library(ggplot2)
n <- 1000
df <- data.frame(x=rnorm(n), y=rnorm(n),
label=sample(letters[1:4], size=n,
replace=TRUE))
df$y[1:50] <- 50 # Add some outliers
## Similar to your plot
ggplot(df, aes(x, y)) + geom_point() + facet_wrap(~ label)
library(plyr)
df.quantiles <- ddply(df, "label", summarise,
q99=quantile(y, probs=0.99),
q90=quantile(y, probs=0.90))
df <- merge(df, df.quantiles, by="label", all.x=TRUE)
## More or less what you want?
ggplot(df[df$y < df$q99, ],
aes(x, y)) + geom_point() + facet_wrap(~ label)
This assumes there are only outliers above, but you could easily extend it to do the same below.
You could try something slightly more sophisticated, maybe
df[df$y < df$q99 | (df$q99 / df$q90) < some.ratio, ]
where you choose some.ratio so that you only throw out the largest 1% of Ys when they are deemed to be outliers, rather than all the time.
Hope that helps.
Upvotes: 2