MIH
MIH

Reputation: 1113

Stretching the x axis and applying a different binwidth to range of values in histogram in ggplot2 R

Here is an example ggplot that I would like to build. In my data I have a problem that I have lots of values in small stretch of the histogram. Thus, I would like to make the x axis disproportionately stretched (here between the values of 80,81,82,83,84,85). So, the tickmarks would be spaced evenly on the graph, and the space between the tickmarks would not be proportionate to the incremental increase in the values on that graph. Consequently, I would also like to apply a different bin size to that part of the histogram (let's say binwidth = 1).

library(ggplot2)

set.seed(42)
data <- data.frame(c(rnorm(mean=80,sd=20,30)),seq(1,30,1),
                   c("A","B","B","A","A","B","B","A","A","A",
                     "A","B","B","A","A","B","B","A","A","B",
                     "B","A","A","B","B","A","A","B","B","A"))
colnames(data) <- c("vals","respondent","category")
# Plot the number of vals
ggplot(data,aes(x = vals,fill = category)) + 
        geom_histogram(position = "stack",binwidth = 5) +
        ggtitle("plot")+
        #scale_x_continuous(c(40,50,60,70,80,81,82,83,84,85,95,105,115))+
        theme_minimal() +
        ylab("Number of respondents")+xlab("Number of vals")

Upvotes: 3

Views: 4120

Answers (1)

Z.Lin
Z.Lin

Reputation: 29125

You can calculate the size (width / height) yourself, as a series of stacked rectangles.

Using the diamonds dataset for illustration, suppose this is our original histogram, and we want to zoom in for the [500, 1000] price range:

ggplot(diamonds,
       aes(x = price, fill = color)) +
  geom_histogram(binwidth = 500) +
  theme_bw()

original

Define your preferred axis breaks:

x.axis.breaks <- c(0,                      # binwidth = 500
                   seq(500, 900, 100),     # binwidth = 100
                   seq(1000, 19000, 500))  # binwidth = 500
> x.axis.breaks
 [1]     0   500   600   700   800   900  1000  1500  2000  2500  3000  3500  4000  4500
[15]  5000  5500  6000  6500  7000  7500  8000  8500  9000  9500 10000 10500 11000 11500
[29] 12000 12500 13000 13500 14000 14500 15000 15500 16000 16500 17000 17500 18000 18500
[43] 19000

Calculate xmin / xmax / ymin / ymax for each interval:

library(dplyr)

diamonds2 <- diamonds %>%
  mutate(price.cut = cut(price,
                         breaks = x.axis.breaks)) %>%
  count(price.cut, color) %>%
  mutate(xmin = x.axis.breaks[as.integer(price.cut)],
         xmax = x.axis.breaks[as.integer(price.cut) + 1]) %>%
  group_by(price.cut) %>%
  arrange(desc(color)) %>%
  mutate(ymax = cumsum(n)) %>%
  mutate(ymin = lag(ymax)) %>%
  mutate(ymin = ifelse(is.na(ymin), 0, ymin)) %>%
  ungroup()

> diamonds2
# A tibble: 294 x 7
   price.cut color     n  xmin  xmax  ymax  ymin
   <fct>     <ord> <int> <dbl> <dbl> <int> <dbl>
 1 0         J       158     0   500   158     0
 2 500       J        80   500   600    80     0
 3 600       J        84   600   700    84     0
 4 700       J        51   700   800    51     0
 5 800       J        43   800   900    43     0
 6 900       J        47   900  1000    47     0
 7 1000      J       145  1000  1500   145     0
 8 1500      J       198  1500  2000   198     0
 9 2000      J       163  2000  2500   163     0
10 2500      J        72  2500  3000    72     0
# ... with 284 more rows

Plot:

p <- ggplot(diamonds2,
       aes(xmin = xmin, xmax = xmax, ymin = ymin, ymax = ymax, fill = color)) +
  geom_rect() +
  theme_bw()

p

plot with different binwidths

I'm not inclined to "stretch" part of a continuous axis, as it distorts interpretation. But you can zoom in using facet_zoom from the ggforce package:

library(ggforce)

p + facet_zoom(x = xmin >= 500 & xmax <= 1000)

with facet zoom

If you don't want the neighbouring bars to be visible in the zoomed facet, set the x-axis range expansion parameters as 0.

p + 
  facet_zoom(x = xmin >= 500 & xmax <= 1000) +
  scale_x_continuous(expand = c(0, 0))

with facet zoom & zero expansion

Edit

To have a different binwidth at the end with customised label, you can make the following changes:

# use even binwidth (500) up to 15000, then jump to the end
x.axis.breaks <- c(0,                      # binwidth = 500
                   seq(500, 900, 100),     # binwidth = 100
                   seq(1000, 15000, 500),  # binwidth = 500
                   19000)                  # everything else

# reduce the largest xmax value in order to have the same bar width
diamonds2 <- diamonds2 %>%
  mutate(xmax = ifelse(xmax == max(xmax),
                       xmin + 500,
                       xmax))

# define breaks & labels for x-axis
p <- p +
  scale_x_continuous(breaks = seq(0, 15000, 5000),
                     labels = c(seq(0, 10000, 5000),
                                "15000+"))

Upvotes: 6

Related Questions