Elad663
Elad663

Reputation: 813

ggplot2 - how to adjust stat_bin and stat to use calculation of a different variable

The goal is to generate a "histogram" of x where the bars are sum(y)/count(x), where y is another variable describing the data. The point is to use ggplot binning to do the grouping part. I do not want to calculate the binning myself and then perform the calculation.

example:

library(ggplot2)
library(data.table)

k <- runif(1000)
k <- k[order(k)]

y <- c(rbinom(n = 500, size = 1, prob = .05), rbinom(n = 500, size = 1, prob = .95))

w <- data.table(k, y)

so a plot(w$k, w$y) gives

enter image description here

so theoretically what I am looking for looks like this: ggplot(w, aes(k)) + geom_histogram(aes(y = stat(sum(y)/count)))

but it generates this:

enter image description here

Upvotes: 0

Views: 216

Answers (1)

dario
dario

Reputation: 6483

Not sure if this is what you want but sum(y) is going to be the same for all bars.

library(ggplot2)
library(data.table)
set.seed(13434)
k <- runif(1000)
k <- k[order(k)]
y <- c(rbinom(n = 500, size = 1, prob = .05), rbinom(n = 500, size = 1, prob = .95))
w <- data.table(k, y)


    constant_value <- sum(w$y)

    ggplot(w, aes(k)) + geom_histogram(aes(y = stat(constant_value/count)))

gives exactly the same plot as

    ggplot(w, aes(k)) + geom_histogram(aes(y = stat(sum(w$y)/count)))

Edit:

Not sure if this helps you, here I use the same binwidth (30) as ggplot2s default:

    library(tidyverse)
    w %>% 
        arrange(k) %>% 
        mutate(bin = cut_interval(1:length(k), length=30, labels=FALSE)) %>% 
        group_by(bin) %>% 
        summarise(mean_y = mean(y),
                  mean_k = mean(k),
                  width = max(k) - min(k)) %>% 
        ggplot(aes(mean_k, mean_y, width=width)) +
        geom_bar(stat="identity") +
        labs(x="k", y="mean y")

which makes this figure:

enter image description here

Upvotes: 1

Related Questions