Lin Ma
Lin Ma

Reputation: 10139

show only 0-90% or 0-95% percentile

Here is my code and plot results, dues to some outliers, the x-axis is very long. Is there a simple method which I can filter df$foo by only 0-90% or 0-95% percentile in R, so that I can plot only normal values? Thanks.

df <- read.csv('~/Downloads/foo.tsv', sep='\t', header=F, stringsAsFactors=FALSE)
names(df) <- c('a', 'foo', 'goo')
df$foo <- as.numeric(df$foo)
goodValue <- df$foo
summary(goodValue)
hist(goodValue,main="Distribution",xlab="foo",breaks=20)

enter image description here

Upvotes: 1

Views: 2372

Answers (2)

shayaa
shayaa

Reputation: 2797

Suppose you wanted to examine the diamonds. (I don't have your data)

library(ggplot2)
library(dplyr)
diamonds %>% ggplot() + geom_histogram(aes(x = price))

enter image description here

You might decide to examine the deciles of your data, and since the tail probability is not of interest to you, you might throw away the top uppermost decile. You could do that as follows, with a free scale so that you can see what is happening within each decile.

diamonds %>% mutate(ntile = ntile(price, 10)) %>% 
  filter(ntile < 10) %>%
  ggplot() + geom_histogram(aes(x = price)) + 
  facet_wrap(~ntile, scales = "free_x") 

But be cautious although seeing your data in a much finer granularity has its benefits, notice how you could almost barely tell that your data is roughly exponentially distributed (with a heavy tail, as commodities price data often are).

enter image description here

Upvotes: 3

Warner
Warner

Reputation: 1363

Maybe this is what you're looking for?

a = c(rnorm(99), 50) #create some data 
quant <- as.numeric(quantile(a, c(0, 0.9))) #get 0 and 0.9 quantile
hist(a[a > quant[1] & a < quant[2]]) #histogram only data within these bounds

Upvotes: 3

Related Questions