Cameron So
Cameron So

Reputation: 149

Sizing scatter plot point mean proportional to sample size

I am creating a scatter plot using ggplot2 and would like to size my point means proportional to the sample size used to calculate the mean. This is my code, where I used fun.y to calculate the mean by group Trt:

branch1 %>%
ggplot() + aes(x=Branch, y=Flow_T, group=Trt, color=Trt) +
stat_summary(aes(group=Trt), fun.y=mean, geom="point", size=)

I am relatively new to R, but my guess is to use size in the aes function to resize my points. I thought it might be a good idea to extract the sample sizes used in fun.y=mean and create a new class that could be inputted into size, however I am not sure how to do that.

Any help will be greatly appreciated! Cheers.

EDIT

Here's my data for reference:

Plant Branch Pod_B Flow_Miss Pod_A Flow_T Trt     Dmg
<int>  <dbl> <int>     <int> <int>  <dbl> <fct> <int>
1     1   1.00     0        16    20  36.0  Early     1
2     1   2.00     0         1    17  18.0  Early     1
3     1   3.00     0         0    17  17.0  Early     1
4     1   4.00     0         3    14  17.0  Early     1
5     1   5.00     5         2     4  11.0  Early     1
6     1   6.00     0         3     7  10.0  Early     1
7     1   7.00     0         4     6  10.0  Early     1
8     1   8.00     0        13     6  19.0  Early     1
9     1   9.00     0         2     7   9.00 Early     1
10     1  10.0      0         2     3   5.00 Early     1

EDIT 2:

Here is a graph of what I'm trying to achieve with proportional sizing by sample size n per Trt (treatment), where the mean is calculated per Trt and Branch number. I'm wondering if I should make Branch a categorical variable.

Plot without Proportional Sizing

Upvotes: 0

Views: 2371

Answers (2)

Maurits Evers
Maurits Evers

Reputation: 50728

If I understood you correctly you would like to scale the size of points based on the number of points per Trt group.

How about something like this? Note that I appended your sample data, because Trt contains only Early entries.

df %>%
    group_by(Trt) %>%
    mutate(ssize = n()) %>%
    ggplot(aes(x = Branch, y = Flow_T, colour = Trt, size = ssize)) +
        geom_point();

enter image description here

Explanation: We group by Trt, then calculate the number of samples per group ssize, and plot with argument aes(...., size = ssize) to ensure that the size of points scale with sscale. You don't need the group aesthetic here.


Update

To scale points according to the mean of Flow_T per Trt we can do:

df %>%
    group_by(Trt) %>%
    mutate(
        ssize = n(),
        mean.Flow_T = mean(Flow_T)) %>%
    ggplot(aes(x = Branch, y = Flow_T, colour = Trt, size = mean.Flow_T)) +
        geom_point();

enter image description here


Sample data

# Sample data
df <- read.table(text =
    "Plant Branch Pod_B Flow_Miss Pod_A Flow_T Trt     Dmg
1     1   1.00     0        16    20  36.0  Early     1
2     1   2.00     0         1    17  18.0  Early     1
3     1   3.00     0         0    17  17.0  Early     1
4     1   4.00     0         3    14  17.0  Early     1
5     1   5.00     5         2     4  11.0  Early     1
6     1   6.00     0         3     7  10.0  Early     1
7     1   7.00     0         4     6  10.0  Early     1
8     1   8.00     0        13     6  19.0  Early     1
9     1   9.00     0         2     7   9.00 Early     1
10     1  10.0      0         2     3   5.00 Early     1
11     1  10.0      0         2     3   20.00 Late     1", header = T)

Upvotes: 1

Cameron So
Cameron So

Reputation: 149

Using @Maurits Evers's help, I created my desired graph by making Branch a factor. The following is my code as well as my intended graph:

branch1$Branch <- as.factor(branch1$Branch)
branch1$Flow_T <- as.numeric(branch1$Flow_T)
branch1 %>%
  group_by(Trt, Branch) %>%
  mutate(ssize = n()) %>%
  ggplot(aes(x = Branch, y = Flow_T, colour = Trt)) +
  stat_summary(aes(size=ssize), fun.y=mean, geom="point")

Final Plot

Upvotes: 0

Related Questions