Codutie
Codutie

Reputation: 1075

How to remove low frequency bins in histogram

Let's say I've a data frame containing an array of numbers which I want to visualise in a histogram. What I want to achieve is to show only the bins containing more than let's say 50 observations.

Step 1

set.seed(10)
x <- data.frame(x = rnorm(1000, 50, 2))
p <- 
  x %>% 
  ggplot(., aes(x)) +
  geom_histogram()

p

enter image description here

Step 2

pg <- ggplot_build(p)

pg$data[[1]]

As a check when I print the pg$data[[1]] I'd like to have only rows where count >= 50.

Thank you

Upvotes: 1

Views: 568

Answers (2)

Merijn van Tilborg
Merijn van Tilborg

Reputation: 5897

You could do something like this, most likely you do not really like the factorized names on the x-axis, but what you can do is split the two values and take the average to take that one to plot the x-axis.

x %>%
  mutate(bin = cut(x, breaks = 30)) %>%
  group_by(bin) %>%
  mutate(count = n()) %>%
  filter(count > 50) %>% 
  ggplot(., aes(bin)) +
  geom_histogram(stat = "count")

enter image description here

Upvotes: 0

TarJae
TarJae

Reputation: 79194

library(ggplot2)

ggplot(x, aes(x=x, y = ifelse(..count.. > 50, ..count.., 0))) +
  geom_histogram(bins=30) 

enter image description here

With this code you can see the counts of the deleted bins:

library(ggplot2)

ggplot(x, aes(x=x, y = ifelse(..count.. > 50, ..count.., 0))) +
  geom_histogram(bins=30, fill="green", color="grey") +
  stat_bin(aes(label=..count..), geom="text", vjust = -0.7)

enter image description here

Upvotes: 2

Related Questions