Thredolsen
Thredolsen

Reputation: 257

Plotting portion out of total counts within time point

I have the following data frame (df1):

Participant    Age    Type
   John         5      A
   John         3      B
   John         3      B
   John         3      C
   John         4      B
   Amy          5      A
   Amy          3      A
   Amy          4      C
   Amy          4      B

I am trying to plot this data using ggplot2, with Age on the y-axis, and Type as the x-axis.

What I was able to do so far is plot it so that for each type, points are mapped in a proportional size to the count of that Type at that age.

The code I used:

ggplot(data = df1, aes(x = Type, y = Age, color = Type)) +
  geom_point() +
  geom_count() +
  facet_wrap(~Participant)

What I am trying to get is a graph where the size of each point is proportional to the count of that type out of the total number of counts at that age.

For example, at age 3, A would be 1/4, B would be 1/2, and C would be 1/4.

I want to be able to graph this both per-participant, and for the data overall.

Upvotes: 0

Views: 297

Answers (1)

Gin_Salmon
Gin_Salmon

Reputation: 847

Tell me if this is what you are after:

library(data.table)

df1 <- data.table(Participant = c("John", "John", "John", "John", "John", "Amy", "Amy", "Amy", "Amy"), 
                   Age = c(5,3,3,3,4,5,3,4,4), Type = c("A", "B",  "B", "C", "B", "A", "A", "C", "B"))

df1[, count_by_Age := .N, by = "Age"]

df1[, count_by_Age_Type := .N, by = c("Age", "Type")]

df1[, proportion := count_by_Age_Type/count_by_Age]

So df1 looks like this:

> df1
   Participant Age Type count_by_Age count_by_Age_Type proportion
1:        John   5    A            2                 2  1.0000000
2:        John   3    B            4                 2  0.5000000
3:        John   3    B            4                 2  0.5000000
4:        John   3    C            4                 1  0.2500000
5:        John   4    B            3                 2  0.6666667
6:         Amy   5    A            2                 2  1.0000000
7:         Amy   3    A            4                 1  0.2500000
8:         Amy   4    C            3                 1  0.3333333
9:         Amy   4    B            3                 2  0.6666667

So, If I've understood you correctly, the proportion column is what you'd like the size argument to be in your ggplot?

    g <- ggplot()
g <- g + geom_point(data = df1, aes(x = Type, y = Age, colour = Type, size = proportion))
g <- g + facet_wrap(~Participant)
print(g)

If so, you get the following: enter image description here

How's that? Might want to adjust the legend though...

Upvotes: 2

Related Questions