T-T
T-T

Reputation: 713

How to create columns based on multiple conditions on the other data frame

I have two data frames like below:

df = data.frame(Vintage = c(2016,2017,2018,2019),
                Mean = c(6.9,11.5,7.5,11.9),
                Upper = c(10.0,14.5,13.2,14.9),
                Median = c(8.3,10.9,10.2,12.1),
                Lower = c(5.3,8.2,6.3,9.4),
                Deviation = c(6.5,5.1,9.3,5.9))
df
  Vintage Mean Upper Median Lower Deviation
1    2016  6.9  10.0    8.3   5.3       6.5
2    2017 11.5  14.5   10.9   8.2       5.1
3    2018  7.5  13.2   10.2   6.3       9.3
4    2019 11.9  14.9   12.1   9.4       5.9
df1 = data.frame(Name = c("A","B","C"),
                 Year = c(2017,2018,2019),
                 Performance = c(7.7,7.2,15.2))
df1
  Name Year Performance
1    A 2017         7.7
2    B 2018         7.2
3    C 2019        15.2

I'd like to add two columns to df1 based on the following conditions:

  1. df1$Quartile: when df1$Year = df$Vintage,
    • if df1$Performance > df$Upper, then "Fourth";
    • if df$Upper>df1$Performance > df$Median, then "Third";
    • if df$Median>df1$Performance > df$Lower, then "Second";
    • if df$Lower>df1$Performance , then "First".
  2. df1$Z_Score = (df1$Performance - df$Mean) / df$Deviation when df1$Year = df$Vintage.

The result should look like this:

  Name Year Performance Quartile Z_Score
1    A 2017         7.7    First   -0.75
2    B 2018         7.2   Second   -0.03
3    C 2019        15.2   Fourth    0.56

Upvotes: 0

Views: 50

Answers (1)

IceCreamToucan
IceCreamToucan

Reputation: 28675

library(dplyr)

df %>% 
  inner_join(df1, by = c(Vintage = 'Year')) %>% 
  mutate(Quartile = case_when(Performance > Upper ~ 'Fourth',
                              Performance > Median ~ 'Third',
                              Performance > Lower ~ 'Second',
                              TRUE ~ 'First'),
         Z_Score = (Performance - Mean)/Deviation) %>% 
  select(Name, Year = Vintage, Performance, Quartile, Z_Score)

#   Name Year Performance Quartile     Z_Score
# 1    A 2017         7.7    First -0.74509804
# 2    B 2018         7.2   Second -0.03225806
# 3    C 2019        15.2   Fourth  0.55932203

You could also use cut instead of dplyr::case_when (as @akrun suggests in comments). Same output as above, except Quartile is now a factor instead of character.

df %>% 
  inner_join(df1, by = c(Vintage = 'Year')) %>% 
  rowwise %>% 
  mutate(Quartile = cut(Performance, c(0, Lower, Median, Upper, Inf), 
                        c('First', 'Second', 'Third', 'Fourth')),
         Z_Score = (Performance - Mean)/Deviation) %>% 
  select(Name, Year = Vintage, Performance, Quartile, Z_Score)

data.table option which modifies df1 rather than creating a new data.frame

df1[df, on = .(Year = Vintage),
    ':='(Quartile = 
           mapply(function(p, l, m, u)
                    cut(p, c(0, l, m, u, Inf), c('First', 'Second', 'Third', 'Fourth')),
                  Performance, i.Lower, i.Median, i.Upper),
         Z_Score = (Performance - i.Mean)/i.Deviation)]


df1
#    Name Year Performance Quartile     Z_Score
# 1:    A 2017         7.7    First -0.74509804
# 2:    B 2018         7.2   Second -0.03225806
# 3:    C 2019        15.2   Fourth  0.55932203

Upvotes: 2

Related Questions