pascal
pascal

Reputation: 2713

Re-bin histogram from frequency table

I have a frequency table (frequency, value), and would like plot this as a histogram in ggplot2. Specifically I have frequencies for each value 1...1e6, and would like breaks at 1,2...10,20...100,200...1000...

The table is computed from a huge data set, which is why using rep as suggested in answers like this one is not an option.

Here is a minimal example:

library(ggplot2)
data <- data.frame(count=(runif(1000) * 100), value=1:1000)
repdata <- data.frame(value=rep(data$value, data$count))
print(ggplot(repdata) + aes(x=value) + scale_x_log10() +
      geom_histogram(binwidth=0.1))

desired output

How can I create a plot like this without using the repdata line? Is there an aggregate function that takes a data frame and a list of breaks?

Upvotes: 0

Views: 820

Answers (1)

pascal
pascal

Reputation: 2713

Ahh, it didn't occur to me until now that I don't have to use a list of breaks; I can simply calculate the bin index from the value and use the existing aggregate:

binw <- 0.1
data$bin <- floor(log10(data$value) / binw)

hdata <- aggregate(count ~ bin, data, sum)

print(ggplot(hdata) +
        aes(xmin=10^(bin * binw),
            xmax=10^((bin + 1) * binw),
            ymin=0,
            ymax=count) +
        scale_x_log10() +
        geom_rect())

Upvotes: 2

Related Questions