Samet Sökel
Samet Sökel

Reputation: 2670

ggplot geom_boxplot by grouping rows

Here is my data;

enter image description here

What I tryna get is something like this ;

enter image description here

axis y should represent the column 'yogunluk', x-axis should show each 'deney' boxplot for every 'yogunluk'. I mean, imagine a vector includes these values for the first row of my data c(7,8,15,11,9,10) one of the boxplots is supposed to be drawn for these values. I couldn't even imagine how to set that kind of mapping in geom_boxplot.

Upvotes: 0

Views: 620

Answers (2)

chemdork123
chemdork123

Reputation: 13803

I've tried to recreate as close as possible what you are showing as your example plot, both because it is fun to do and because it can demonstrate some elements of how to organize your data and how to build a plot with ggplot2.

Data Wrangling

First, your data is not in what is called Tidy Data format. (here is the recreation):

yourData <- data.frame(
  yogunluk = c(5,10,15,20),
  deney1 = c(7,12,14,19),
  deney2 = c(8,17,18,25),
  deney3 = c(15,13,19,22),
  deney4 = c(11,18,17,23),
  deney5 = c(9,19,16,18),
  deney6 = c(10,15,18,20),
  toplam = c(60,94,102,127)
)

You should consider that you have a value for every deney in your dataset at a given yoguluk. If we think this way, you will need to "gather" all those columns for deney (deney1, deney2, deney3,...) and their respective values into two new columns: one to identify "which" deney you are talking about and one for the value. We will use the gather() function from tidyr to push these together, but you can also use functions like pivot_longer():

library(dplyr)
library(tidyr)
library(ggplot2)

yourData <- yourData %>%
  select(yogunluk, deney1,deney2,deney3,deney4,deney5,deney6) %>%
  gather(deney, value, -yogunluk)

Note that we are first only selecting the columns for yogunluk and deney1, deney2,..., and disregarding the summary statistic columns. The reason is because we will be summarizing that data within the actual ggplot2 functions. The alternative approach is to summarize the data beforehand, then plot that. It's kind of up to you, but I find that it's nicer to do it this way when you might show both individual data points as well as their summaries (like the average of a group of data) in the same plot.

The Plot

Now that your data is formatted correctly, I'm going to create the plot. I'm assuming you actually switched what you meant for the x and y axes - the x axis is on the bottom, so left to right, we will assign specific yourData$yogunluk values, whereas the values themselves for each deney... will be used to create the boxplots along the y axis.

I'll show you the code and the plot, then explain a bit how each part works under that:

ggplot(yourData, aes(x=factor(yogunluk), y=value)) +
  geom_boxplot(width=0.5, fill='dodgerblue3', alpha=0.5) +
  stat_summary(geom='point', shape=10, size=5) +
  stat_summary(geom='line', aes(group=1)) +
  
  labs(y='Data', x='Yogunluk', title='Boxplot of Something') +
  
  theme_classic() +
  theme(
    plot.background = element_rect(fill='gray90'),
    plot.margin = margin(30,30,30,30),
    plot.title.position = 'plot',
    plot.title = element_text(hjust=0.5, size=16)
  )

enter image description here

Discussion on Plot Code

I've broken the plot code into parts intentionally (and usually do so on my own plots) just so that I can keep track of the various "parts" of the plot itself. I'll explain each part in turn.

Plot area and geoms

The first part is defining the plot aspects, main aesthetics, and geoms. This covers all the stuff that's added to the plot based on the dataset yourData. Note first that I refer to factor(yogunluk) rather than just yogunluk. Why? Because yourData$yogunluk is a class of int - a continuous variable, and here we are wanting to plot this as a discrete variable. In order to create a boxplot that is grouped by yogunluk, the act of grouping by definition is making our x axis discrete. So... we can just force it to be this way by asking ggplot2 to consider this column to be a factor.

The geom_boxplot() code is pretty straightforward.

The stat_summary() commands are taking place of using geom_point() and geom_line(). Why use stat_summary? Because note we don't have summarized data. You can understand it this way: geom_point() would treat the dataset by plotting each point/line/observation in the data separately. If we wanted to plot a geom_point() on sumamrized data (like the mean()) you would use stat_summary(). This first calculates a summary statistic (default is mean_se()), then use that to replace the y aesthetic. In this case, stat_summary(geom='point') will calculate the mean_se() of each set of yourData$value, grouped by factor(yourData$yogunluk). That mean is used as the new y aesthetic.

The extra note on stat_summary(geom='line'...) is that we're assigning a group aesthetic = 1. Why? This is so that ggplot2 treats the whole thing as one "group". Otherwise, ggplot2 will just compute the average for each discrete value of yogunluk (which is what we want), but will not know that it needs to connect all these points. Assigning group=1 just tells ggplot2 that "all these points should be part of the same group... so draw a line connecting them, please.".

Labels

Pretty straightforward I think.

Theme Elements

Also pretty straightforward. For a full list of theme elements, just check out this reference.

Upvotes: 1

Vons
Vons

Reputation: 3325

This is one option, to first reshape the data and then feed it to ggplot2.

library(tidyr)
library(dplyr)
library(ggplot2)

dat=data.frame(yogunluk=c(5,10,15,20),
               deney1=c(7,12,14,19),
               deney2=c(8,17,18,25),
               deney3=c(15,13,19,22),
               deney4=c(11,18,17,23),
               deney5=c(9,19,16,18),
               toplam=c(60,94,102,127),
               ortlama=c(10,15.6,17,21.16),
               sd=c(2.8,2.8,1.7,2.6),
               var=c(8,7,3,6.9))

d=pivot_longer(dat, cols=deney1:deney5)
d=d %>% group_by(yogunluk) %>% summarize(mea=mean(value)) %>% right_join(d)

ggplot(d) +
  geom_boxplot(aes(yogunluk, value, group=yogunluk), fill="#3792cb", width=1) +
  geom_line(aes(yogunluk, mea)) +
  geom_point(aes(yogunluk, mea), size=3, pch=3) +
  ggtitle("Boxplot of 5;10;...") + 
  ylab("Data") + 
  xlab("") +
  theme_bw() +
  theme(plot.title = element_text(hjust = 0.5))

enter image description here

Upvotes: 1

Related Questions