Reputation: 2713
I have a frequency table (frequency, value), and would like plot this as a histogram in ggplot2. Specifically I have frequencies for each value 1...1e6, and would like breaks at 1,2...10,20...100,200...1000...
The table is computed from a huge data set, which is why using rep
as suggested in answers like this one is not an option.
Here is a minimal example:
library(ggplot2)
data <- data.frame(count=(runif(1000) * 100), value=1:1000)
repdata <- data.frame(value=rep(data$value, data$count))
print(ggplot(repdata) + aes(x=value) + scale_x_log10() +
geom_histogram(binwidth=0.1))
How can I create a plot like this without using the repdata
line?
Is there an aggregate function that takes a data frame and a list of breaks?
Upvotes: 0
Views: 820
Reputation: 2713
Ahh, it didn't occur to me until now that I don't have to use a list of breaks; I can simply calculate the bin index from the value and use the existing aggregate:
binw <- 0.1
data$bin <- floor(log10(data$value) / binw)
hdata <- aggregate(count ~ bin, data, sum)
print(ggplot(hdata) +
aes(xmin=10^(bin * binw),
xmax=10^((bin + 1) * binw),
ymin=0,
ymax=count) +
scale_x_log10() +
geom_rect())
Upvotes: 2