David
David

Reputation: 465

Plot histogram bins only if count is above a threshold

Let's say I have a vector like this:

mydata = c(1, 3, 4, 5, 6, 7, 8, 9, 10)

A five-break histogram would look like this:

h = hist(mydata, breaks=5)

enter image description here

How can I plot only the bins whose frequency count is above a threshold? In this case, any count greater than 1.

I would like to end up with the following histogram:

enter image description here

I know I can access the counts and breaks with h$counts and h$breaks but I cannot think of a simple way to use these to filter out some bins.

Upvotes: 2

Views: 1767

Answers (3)

Marcus
Marcus

Reputation: 3636

I'm assuming if the bucket that's below the threshold is in the middle of the histogram, you just want to drop the bucket.

Given that, it's a matter of adjusting your axis limits to the first and last non-zero buckets.

So far an initial histogram

mydata2 <- c(1, 3, 4, 5, 6, 7, 3, 9, 10, 12)

h2 <- hist(mydata2, breaks=6)

before

It would be transformed like this

h2$counts[ h2$counts < 2] <- 0
xmin <- h2$breaks[min(which(h2$counts != 0))] 
xmax <- h2$breaks[max(which(h2$counts != 0)) + 1] 
plot(h2, xlim = c(xmin, xmax))

after

If you want to merge the middle bucket into other frequencies, then that gets more complicated and depends on what merging rules you want to use.

Upvotes: 1

AkselA
AkselA

Reputation: 8837

In this particular case you can do it like this, but it's not generalizable beyond bins being contiguous and at the left end of the histogram.

f <- -which(h$counts < 2)
h[1:4] <- lapply(h[1:4], "[", f)
h
# $breaks
# [1]  2  4  6  8 10
# 
# $counts
# [1] 2 2 2 2
# 
# $density
# [1] 0.1111111 0.1111111 0.1111111 0.1111111
# 
# $mids
# [1] 3 5 7 9
# 
# $xname
# [1] "mydata"
# 
# $equidist
# [1] TRUE
# 
# attr(,"class")
# [1] "histogram"

If you want to cover cases where the bins are at either ends you'll have to step up a little in code complexity.

mydata <- c(1, 3, 4, 5, 6, 7, 8, 9, 10, 12)
h <- hist(mydata, breaks=6)

f1 <- h$counts < 2
f2 <- rle(f1)
if (length(f2$lengths) == 3) {
    f2$lengths[2] <- f2$lengths[2] + 1
    f2 <- which(inverse.rle(f2))
} else {
    f2 <- which(f1)
}

h[2:4] <- lapply(h[2:4], "[", !f1)
h[[1]] <- h[[1]][-f2]

plot(h)

enter image description here

enter image description here

Upvotes: 2

January
January

Reputation: 17090

You can directly manipulate the object returned by hist and plot it:

h$counts[ h$counts < 2 ] <- 0
plot(h)

Upvotes: 0

Related Questions