Reputation: 465
Let's say I have a vector like this:
mydata = c(1, 3, 4, 5, 6, 7, 8, 9, 10)
A five-break histogram would look like this:
h = hist(mydata, breaks=5)
How can I plot only the bins whose frequency count is above a threshold? In this case, any count greater than 1.
I would like to end up with the following histogram:
I know I can access the counts and breaks with h$counts
and h$breaks
but I cannot think of a simple way to use these to filter out some bins.
Upvotes: 2
Views: 1767
Reputation: 3636
I'm assuming if the bucket that's below the threshold is in the middle of the histogram, you just want to drop the bucket.
Given that, it's a matter of adjusting your axis limits to the first and last non-zero buckets.
So far an initial histogram
mydata2 <- c(1, 3, 4, 5, 6, 7, 3, 9, 10, 12)
h2 <- hist(mydata2, breaks=6)
It would be transformed like this
h2$counts[ h2$counts < 2] <- 0
xmin <- h2$breaks[min(which(h2$counts != 0))]
xmax <- h2$breaks[max(which(h2$counts != 0)) + 1]
plot(h2, xlim = c(xmin, xmax))
If you want to merge the middle bucket into other frequencies, then that gets more complicated and depends on what merging rules you want to use.
Upvotes: 1
Reputation: 8837
In this particular case you can do it like this, but it's not generalizable beyond bins being contiguous and at the left end of the histogram.
f <- -which(h$counts < 2)
h[1:4] <- lapply(h[1:4], "[", f)
h
# $breaks
# [1] 2 4 6 8 10
#
# $counts
# [1] 2 2 2 2
#
# $density
# [1] 0.1111111 0.1111111 0.1111111 0.1111111
#
# $mids
# [1] 3 5 7 9
#
# $xname
# [1] "mydata"
#
# $equidist
# [1] TRUE
#
# attr(,"class")
# [1] "histogram"
If you want to cover cases where the bins are at either ends you'll have to step up a little in code complexity.
mydata <- c(1, 3, 4, 5, 6, 7, 8, 9, 10, 12)
h <- hist(mydata, breaks=6)
f1 <- h$counts < 2
f2 <- rle(f1)
if (length(f2$lengths) == 3) {
f2$lengths[2] <- f2$lengths[2] + 1
f2 <- which(inverse.rle(f2))
} else {
f2 <- which(f1)
}
h[2:4] <- lapply(h[2:4], "[", !f1)
h[[1]] <- h[[1]][-f2]
plot(h)
Upvotes: 2
Reputation: 17090
You can directly manipulate the object returned by hist
and plot it:
h$counts[ h$counts < 2 ] <- 0
plot(h)
Upvotes: 0