Reputation: 65
I'm trying to make multiple boxplots with ggplot2 side by side. I've been following the stes Multiple boxplots placed side by side for different column values in ggplot but without much luck.
I have the following dataframes
Raw <- sp500_logreturns
Normal <- rnorm(1000, 0, sd(sp500_logreturns)
Student <- cbind(c(rt(1000, df = 2)),c(rt(1000, df = 3)))
And I want to make the following Boxplot sketch
My Raw
vector contains logreturns-transformation of my prices downloaded as an environment from yahoo into R. I must admit I'm quite lost, and do not know if I'm on an impossible mission. I hope I've described my problem well enough together with my sketch. Thank you in advance.
Update 1: The goal is to compare the raw data distribution (which is leptokurtic) and therefore a student disitribution with 2 or 3 degree of freedom might be more suitable than a normal distribution. To give you an idea of the data I'm looking at, here's a summary:
Min. 1st Qu. Median Mean 3rd Qu. Max.
-0.0418425 -0.0023740 0.0005898 0.0004704 0.0045065 0.0484032
Here is my boxplot made from Edward's code: Boxplot (Edward)
Update 2: I figured it out. I used fitdist
from rugarch
to find out the best student distribution fitted to the raw data. This way I could ignore trying to match different dfs of the student distribution. This is what I will go on with:
fitdist(distribution = 'std', sp500_logreturns)$pars
mu sigma shape
0.0008121004 0.0113748869 2.3848231857
data <- data.frame(
Raw = as.numeric(sp500_logreturns),
Normal = rnorm(1006, 0, sd(sp500_logreturns)),
Student = rdist(distribution = 'std', n = 1006, mu = 0.0008121004, sigma = 0.0113748869, shape = 2.3848231857)
)
data2 <- pivot_longer(data, cols=everything()) %>%
mutate(name=factor(name, levels=c("Raw","Normal","Student")))
data3 <- data2 %>% summarise(min=min(value), max=max(value))
pbox1 <- (filter(data2, name %in% c("Raw","Normal","Student")) %>%
ggplot(aes(y=value, fill=name)) +
geom_boxplot() +
facet_grid(~name) +
ylab("Log-returns") +
ylim(data3$min, data3$max) +
theme(legend.position = "none",
axis.ticks.x=element_blank(),
panel.grid.major.x = element_blank(),
panel.grid.minor.x = element_blank(),
axis.text.x=element_text(color="white"))+
ggtitle("Boxplot comparison")+
theme(plot.title = element_text(hjust = 0.5)))
And this gives me: Boxplot (final)
Upvotes: 0
Views: 2430
Reputation: 18598
In base R:
set.seed(11)
data <- data.frame(
Raw = rnorm(1000),
Normal = rnorm(1000),
Student = cbind(c(rt(1000, df = 2)),c(rt(1000, df = 3)))
)
ylim=c(min(data), max(data))
layout(matrix(1:3, nc=3), widths=c(5,4,5))
par(las=1, mar=c(2,4,5,0))
boxplot(daat$Raw, col="steelblue", ylab="Log-returns", ylim=ylim)
title(main="Raw", line=1)
par(mar=c(2,1,5,0))
boxplot(data$Normal, yaxt="n", col="tomato", ylim=ylim)
title(main="Normal", line=1)
par(mar=c(2,1,5,1))
boxplot(data[,3:4], yaxt="n", col=c("green1","green3"), names=c("df = 2","df = 3"), ylim=ylim)
title(main="Student", line=1)
title(main="Boxplot comparison", outer=TRUE, line=-1.5, cex.main=1.5)
In ggplot2, more work is invovled:
set.seed(11)
data <- data.frame(
Raw = rnorm(1000),
Normal = rnorm(1000),
Student = cbind(c(rt(1000, df = 2)),c(rt(1000, df = 3)))
)
library(dplyr)
library(tidyr)
library(ggplot2)
data2 <- pivot_longer(data, cols=everything()) %>%
mutate(name=factor(name, levels=c("Raw","Normal","Student.1","Student.2")))
data3 <- data2 %>% summarise(min=min(value), max=max(value))
p1 <- filter(data2, name %in% c("Raw","Normal")) %>%
ggplot(aes(y=value, fill=name)) +
geom_boxplot() +
facet_grid(~name) +
ylab("Log-returns") +
ylim(data3$min, data3$max) +
theme_bw() +
theme(legend.position = "none",
axis.ticks.x=element_blank(),
panel.grid.major.x = element_blank(),
panel.grid.minor.x = element_blank(),
axis.text.x=element_text(color="white"))
p2 <- filter(data2, grepl("Student", name)) %>%
mutate(what="Student") %>%
ggplot(aes(x=name, y=value, fill=name)) +
geom_boxplot() +
scale_fill_manual(values=c("green1","green3")) +
scale_x_discrete(labels=c("df=2", "df=3")) +
facet_grid(~what) +
ylim(data3$min, data3$max) +
theme_bw() +
theme(legend.position = "none",
axis.title.y = element_blank(),
axis.title.x=element_blank(),
axis.text.y=element_blank(),
axis.ticks.y=element_blank())
library(ggpubr)
ggarrange(p1, p2)
Upvotes: 2