Reputation: 713
I have two data frames like below:
df = data.frame(Vintage = c(2016,2017,2018,2019),
Mean = c(6.9,11.5,7.5,11.9),
Upper = c(10.0,14.5,13.2,14.9),
Median = c(8.3,10.9,10.2,12.1),
Lower = c(5.3,8.2,6.3,9.4),
Deviation = c(6.5,5.1,9.3,5.9))
df
Vintage Mean Upper Median Lower Deviation
1 2016 6.9 10.0 8.3 5.3 6.5
2 2017 11.5 14.5 10.9 8.2 5.1
3 2018 7.5 13.2 10.2 6.3 9.3
4 2019 11.9 14.9 12.1 9.4 5.9
df1 = data.frame(Name = c("A","B","C"),
Year = c(2017,2018,2019),
Performance = c(7.7,7.2,15.2))
df1
Name Year Performance
1 A 2017 7.7
2 B 2018 7.2
3 C 2019 15.2
I'd like to add two columns to df1
based on the following conditions:
df1$Quartile
: when df1$Year
= df$Vintage
,
df1$Performance
> df$Upper
, then "Fourth";df$Upper
>df1$Performance
> df$Median
, then "Third";df$Median
>df1$Performance
> df$Lower
, then "Second";df$Lower
>df1$Performance
, then "First".df1$Z_Score
= (df1$Performance - df$Mean) / df$Deviation
when df1$Year
= df$Vintage
.The result should look like this:
Name Year Performance Quartile Z_Score
1 A 2017 7.7 First -0.75
2 B 2018 7.2 Second -0.03
3 C 2019 15.2 Fourth 0.56
Upvotes: 0
Views: 50
Reputation: 28675
library(dplyr)
df %>%
inner_join(df1, by = c(Vintage = 'Year')) %>%
mutate(Quartile = case_when(Performance > Upper ~ 'Fourth',
Performance > Median ~ 'Third',
Performance > Lower ~ 'Second',
TRUE ~ 'First'),
Z_Score = (Performance - Mean)/Deviation) %>%
select(Name, Year = Vintage, Performance, Quartile, Z_Score)
# Name Year Performance Quartile Z_Score
# 1 A 2017 7.7 First -0.74509804
# 2 B 2018 7.2 Second -0.03225806
# 3 C 2019 15.2 Fourth 0.55932203
You could also use cut
instead of dplyr::case_when
(as @akrun suggests in comments). Same output as above, except Quartile is now a factor instead of character.
df %>%
inner_join(df1, by = c(Vintage = 'Year')) %>%
rowwise %>%
mutate(Quartile = cut(Performance, c(0, Lower, Median, Upper, Inf),
c('First', 'Second', 'Third', 'Fourth')),
Z_Score = (Performance - Mean)/Deviation) %>%
select(Name, Year = Vintage, Performance, Quartile, Z_Score)
data.table option which modifies df1
rather than creating a new data.frame
df1[df, on = .(Year = Vintage),
':='(Quartile =
mapply(function(p, l, m, u)
cut(p, c(0, l, m, u, Inf), c('First', 'Second', 'Third', 'Fourth')),
Performance, i.Lower, i.Median, i.Upper),
Z_Score = (Performance - i.Mean)/i.Deviation)]
df1
# Name Year Performance Quartile Z_Score
# 1: A 2017 7.7 First -0.74509804
# 2: B 2018 7.2 Second -0.03225806
# 3: C 2019 15.2 Fourth 0.55932203
Upvotes: 2