Reputation: 10139
Here is my code and plot results, dues to some outliers, the x-axis is very long. Is there a simple method which I can filter df$foo
by only 0-90% or 0-95% percentile in R, so that I can plot only normal values? Thanks.
df <- read.csv('~/Downloads/foo.tsv', sep='\t', header=F, stringsAsFactors=FALSE)
names(df) <- c('a', 'foo', 'goo')
df$foo <- as.numeric(df$foo)
goodValue <- df$foo
summary(goodValue)
hist(goodValue,main="Distribution",xlab="foo",breaks=20)
Upvotes: 1
Views: 2372
Reputation: 2797
Suppose you wanted to examine the diamonds. (I don't have your data)
library(ggplot2)
library(dplyr)
diamonds %>% ggplot() + geom_histogram(aes(x = price))
You might decide to examine the deciles of your data, and since the tail probability is not of interest to you, you might throw away the top uppermost decile. You could do that as follows, with a free scale so that you can see what is happening within each decile.
diamonds %>% mutate(ntile = ntile(price, 10)) %>%
filter(ntile < 10) %>%
ggplot() + geom_histogram(aes(x = price)) +
facet_wrap(~ntile, scales = "free_x")
But be cautious although seeing your data in a much finer granularity has its benefits, notice how you could almost barely tell that your data is roughly exponentially distributed (with a heavy tail, as commodities price data often are).
Upvotes: 3
Reputation: 1363
Maybe this is what you're looking for?
a = c(rnorm(99), 50) #create some data
quant <- as.numeric(quantile(a, c(0, 0.9))) #get 0 and 0.9 quantile
hist(a[a > quant[1] & a < quant[2]]) #histogram only data within these bounds
Upvotes: 3