Reputation: 229
With a data frame that has has a grouping column and a value column, plotting a grouped boxplot with ggplot2 is done like this
ggplot(data=data, aes(x = Grouping, y = Value, group = Grouping)) + geom_boxplot()
However, how would you plot a grouped boxplot when you have an extra column that designates the number of observations for that value/grouping pair? For example, for the below data frame, there are 17 data points for grouping A and 11 for grouping B, each with their respective value.
Grouping Value NumberObservations
A 1 10
B 1 2
A 2 7
B 2 9
Of course, another data frame can be created that contains 10 rows of grouping A and value 1 and so on to use the above ggplot method, but I want to avoid this because my data frame would get very large due to the number of observations. Is there a way to weight/add number of observations directly in a ggplot box plot?
Upvotes: 0
Views: 467
Reputation: 145745
Neither the base boxplot
or the ggplot geom_boxplot
functions expect data with weights/counts like this, so I think your best bet is to expand the data into individual observations.
expanded_data = data[rep(seq_len(nrow(data)), times = data$NumberObservations), ]
ggplot(data = expanded_data,
aes(x = Grouping, y = Value, group = Grouping)) +
geom_boxplot()
Upvotes: 3