hartmut
hartmut

Reputation: 982

superpose densities, non exclusive subsets

I need to have several density functions onto a single plot. Each density corresponds to a subset of my overall dataset. The subsets are defined by the value taken by one of the variables in the dataset.

Concretely, I would like to draw a density function for 1, 3, and 10 years horizons. Of course, the 10 years horizons includes the shorter ones. Likewise, the 3 year horizon density should be constructed taking data from the last year. The subsets need to correspond to data[period == 1,], data[period <= 3, ], data[period == 10,].

I have managed to do so by adding geom_densitys on top of each other, i.e., by redefining the data each time.

  ggplot() +
    geom_density(data = data[period <=3,], aes(x=BEST_CUR_EV_TO_EBITDA), alpha=.2, fill="red") +
    geom_density(data = data[period ==1,], aes(x=BEST_CUR_EV_TO_EBITDA), alpha=.2, fill="grey") +
    geom_density(data = data, aes(x=BEST_CUR_EV_TO_EBITDA), alpha=.2, fill="green")

It works fine but I feel like this is not the right way to do it (and indeed, it makes e.g., the creation of a legend cumbersome).

On the other hand, doing like that :

ggplot(data, aes(x=BEST_CUR_EV_TO_EBITDA, color=period)) +
  geom_density(alpha=.2, fill="blue")

won't do because then the periods are taken to be mutually exclusive.

Is there a way to specify aes(color) based on the value taken by period where subsets overlap?

Running code:

library(data.table)
library(lubridate)
library(ggplot2)
  YEARS <- 10
  today <- Sys.Date()
  lastYr <- Sys.Date()-years(1)
  last3Yr <- Sys.Date()-years(3) 
  start.date  = Sys.Date()-years(YEARS)
  date = seq(start.date, Sys.Date(), by=1)
  BEST_CUR_EV_TO_EBITDA <- rnorm(length(date), 3,1)
  data <- cbind.data.frame(date, BEST_CUR_EV_TO_EBITDA)
  data <- cbind.data.frame(data, period = rep(10, nrow(data)))

  subPeriods <- function(aDf, from, to, value){
    aDf[aDf$date >= from & aDf$date <= to, "period"] = value
    return(aDf)
  }

  data <- subPeriods(data, last3Yr, today, 3)
  data <- subPeriods(data, lastYr, today, 1)
  data <- data.table(data)



  colScale <- scale_colour_manual(
    name = "horizon"
    , values = c("1 Y" = "grey", "3 Y" = "red", "10 Y" = "green"))

  ggplot() +
    geom_density(data = data[period <=3,], aes(x=BEST_CUR_EV_TO_EBITDA), alpha=.2, fill="red") +
    geom_density(data = data[period ==1,], aes(x=BEST_CUR_EV_TO_EBITDA), alpha=.2, fill="grey") +
    geom_density(data = data, aes(x=BEST_CUR_EV_TO_EBITDA), alpha=.2, fill="green") +
    colScale

Upvotes: 1

Views: 212

Answers (1)

Adam Quek
Adam Quek

Reputation: 7153

One of the ways to deal with dependent grouping is to create an independent grouping based on the existing groups. The way I'd opted to do it below is by creating three new columns (period_one, period_three and period_ten) with mutate function, where

  • period_one= BEST_CUR_EV_TO_EBITDA values for period==1
  • period_three= BEST_CUR_EV_TO_EBITDA values for period<=1
  • period_ten= BEST_CUR_EV_TO_EBITDA values for all periods

These columns were then converted into the long-format using gather function, where the columns (period_one, period_three and period_ten) are stacked in "period" variable, and the corresponding values in the column "val".

df2 <- data %>% 
    mutate(period_one=ifelse(period==1, BEST_CUR_EV_TO_EBITDA, NA),
            period_three=ifelse(period<=3, BEST_CUR_EV_TO_EBITDA, NA),
            period_ten=BEST_CUR_EV_TO_EBITDA) %>%
   select(date, starts_with("period_")) %>%
   gather(period, val, period_one, period_three, period_ten)

The ggplot is straightforward with long format consisting of independent grouping:

ggplot(df2, aes(val, fill=period)) + geom_density(alpha=.2)

enter image description here

Upvotes: 3

Related Questions