rnewbie
rnewbie

Reputation: 155

Plotting extreme values on a histogram with ggplot

Data:

data = data.frame(rnorm(250, 90, sd = 30))

I want to create a histogram where I have a bin of fixed width, but all observation which are bigger than arbitrary number or lower than another arbitrary number are group in their own bins. To take the above data as an example, I want binwidth = 10, but all values above 100 together in one bin and all values bellow 20 together in their own bin.

I looked at some answers, but they make no sense to me since they are mostly code. I would appreciate it greatly if somebody can explain the steps.

Upvotes: 0

Views: 1706

Answers (1)

eipi10
eipi10

Reputation: 93811

The examples below show how to create the desired histogram in base graphics and with ggplot2. Note that the resulting histogram will be quite distorted compared to one with a constant break size.

Base Graphics

The R function hist creates the histogram and allows us to set whatever bins we want using the breaks argument:

# Fake data
set.seed(1049)
dat = data.frame(value=rnorm(250, 90, 30))

hist(dat$value, breaks=c(min(dat$value), seq(20,100,10), max(dat$value)))

In the code above c(min(dat$value), seq(20,100,10), max(dat$value)) sets breaks that start at the lowest data value and end at the highest data value. In between we use seq to create a sequence of breaks that goes from 20 to 100 by increments of 10. Here's what the plot looks like:

enter image description here

ggplot2

library(ggplot2)

ggplot(dat, aes(value)) +
  geom_histogram(breaks=c(min(dat$value), seq(20,100,10), max(dat$value)),
                 aes(y=..density..), color="grey30", fill=hcl(240,100,65)) +
  theme_light()

enter image description here

Upvotes: 1

Related Questions