Reputation: 982
I need to have several density functions onto a single plot. Each density corresponds to a subset of my overall dataset. The subsets are defined by the value taken by one of the variables in the dataset.
Concretely, I would like to draw a density function for 1, 3, and 10 years horizons. Of course, the 10 years horizons includes the shorter ones. Likewise, the 3 year horizon density should be constructed taking data from the last year.
The subsets need to correspond to data[period == 1,]
, data[period <= 3, ]
, data[period == 10,]
.
I have managed to do so by adding geom_density
s on top of each other, i.e., by redefining the data each time.
ggplot() +
geom_density(data = data[period <=3,], aes(x=BEST_CUR_EV_TO_EBITDA), alpha=.2, fill="red") +
geom_density(data = data[period ==1,], aes(x=BEST_CUR_EV_TO_EBITDA), alpha=.2, fill="grey") +
geom_density(data = data, aes(x=BEST_CUR_EV_TO_EBITDA), alpha=.2, fill="green")
It works fine but I feel like this is not the right way to do it (and indeed, it makes e.g., the creation of a legend cumbersome).
On the other hand, doing like that :
ggplot(data, aes(x=BEST_CUR_EV_TO_EBITDA, color=period)) +
geom_density(alpha=.2, fill="blue")
won't do because then the periods are taken to be mutually exclusive.
Is there a way to specify aes(color)
based on the value taken by period
where subsets overlap?
Running code:
library(data.table)
library(lubridate)
library(ggplot2)
YEARS <- 10
today <- Sys.Date()
lastYr <- Sys.Date()-years(1)
last3Yr <- Sys.Date()-years(3)
start.date = Sys.Date()-years(YEARS)
date = seq(start.date, Sys.Date(), by=1)
BEST_CUR_EV_TO_EBITDA <- rnorm(length(date), 3,1)
data <- cbind.data.frame(date, BEST_CUR_EV_TO_EBITDA)
data <- cbind.data.frame(data, period = rep(10, nrow(data)))
subPeriods <- function(aDf, from, to, value){
aDf[aDf$date >= from & aDf$date <= to, "period"] = value
return(aDf)
}
data <- subPeriods(data, last3Yr, today, 3)
data <- subPeriods(data, lastYr, today, 1)
data <- data.table(data)
colScale <- scale_colour_manual(
name = "horizon"
, values = c("1 Y" = "grey", "3 Y" = "red", "10 Y" = "green"))
ggplot() +
geom_density(data = data[period <=3,], aes(x=BEST_CUR_EV_TO_EBITDA), alpha=.2, fill="red") +
geom_density(data = data[period ==1,], aes(x=BEST_CUR_EV_TO_EBITDA), alpha=.2, fill="grey") +
geom_density(data = data, aes(x=BEST_CUR_EV_TO_EBITDA), alpha=.2, fill="green") +
colScale
Upvotes: 1
Views: 212
Reputation: 7153
One of the ways to deal with dependent grouping is to create an independent grouping based on the existing groups. The way I'd opted to do it below is by creating three new columns (period_one
, period_three
and period_ten
) with mutate
function, where
period_one
= BEST_CUR_EV_TO_EBITDA values for period==1period_three
= BEST_CUR_EV_TO_EBITDA values for period<=1period_ten
= BEST_CUR_EV_TO_EBITDA values for all periodsThese columns were then converted into the long-format using gather
function, where the columns (period_one
, period_three
and period_ten
) are stacked in "period" variable, and the corresponding values in the column "val".
df2 <- data %>%
mutate(period_one=ifelse(period==1, BEST_CUR_EV_TO_EBITDA, NA),
period_three=ifelse(period<=3, BEST_CUR_EV_TO_EBITDA, NA),
period_ten=BEST_CUR_EV_TO_EBITDA) %>%
select(date, starts_with("period_")) %>%
gather(period, val, period_one, period_three, period_ten)
The ggplot is straightforward with long format consisting of independent grouping:
ggplot(df2, aes(val, fill=period)) + geom_density(alpha=.2)
Upvotes: 3