Mason
Mason

Reputation: 229

How to use ggplot2 in R to plot a boxplot with number of observations?

With a data frame that has has a grouping column and a value column, plotting a grouped boxplot with ggplot2 is done like this

ggplot(data=data, aes(x = Grouping, y = Value, group = Grouping)) + geom_boxplot()

However, how would you plot a grouped boxplot when you have an extra column that designates the number of observations for that value/grouping pair? For example, for the below data frame, there are 17 data points for grouping A and 11 for grouping B, each with their respective value.

Grouping   Value   NumberObservations
A          1       10
B          1       2
A          2       7
B          2       9

Of course, another data frame can be created that contains 10 rows of grouping A and value 1 and so on to use the above ggplot method, but I want to avoid this because my data frame would get very large due to the number of observations. Is there a way to weight/add number of observations directly in a ggplot box plot?

Upvotes: 0

Views: 467

Answers (1)

Gregor Thomas
Gregor Thomas

Reputation: 145745

Neither the base boxplot or the ggplot geom_boxplot functions expect data with weights/counts like this, so I think your best bet is to expand the data into individual observations.

expanded_data = data[rep(seq_len(nrow(data)), times = data$NumberObservations), ]
ggplot(data = expanded_data,
  aes(x = Grouping, y = Value, group = Grouping)) + 
  geom_boxplot()

Upvotes: 3

Related Questions