Reputation: 193
I have this example of a dataset I'm working with:
ex <- structure(list(reg_desc = c("1-Northeast Region", "1-Northeast Region",
"1-Northeast Region", "1-Northeast Region", "1-Northeast Region"
), state = c("04-Connecticut", "05-Maine", "04-Connecticut",
"05-Maine", NA), trigger_city = c("14860-Bridgeport-Stamford-Norwalk",
"12620-Bangor", NA, NA, NA), Category = c("M", "M", "S", "S",
"R"), Cred_Fac = c(0, 0, 0.317804971641414, 0, 1), Mean = c(50323.3311111111,
48944.4266666667, 44220.8220792079, 43724.1495, 50492.0654351396
)), row.names = c(1L, 7L, 118L, 119L, 136L), class = "data.frame")
I have a category column where M is a row at a metropolitan level, S is state level, and R is region level. I want to create a new column based on an if-else statement I'd like to implement, but can't seem to get it right.
The code is:
ex %>% mutate(New_Mean = if(any(Cred_Fac == 1) Mean else if(any(Cred_Fac < 1) if(Category == 'S' & Cred_Fac == 1) M_Mean * M_Cred+R_Mean*(1-M_Cred_Fac) if(Category == 'R' & Cred_Fac == 1) M_Mean * M_Cred+R_Mean*(1-M_Cred_Fac))
My logic should be: if at the M level, Cred_Fac is 1, then keep Mean as is; if less than 1, move on to State level and if at State level Cred_Fac is 1, then do M_Mean * M_Cred+R_Mean*(1-M_Cred_Fac); repeat process if at State level Cred_Fac is not 1.
I guess one idea I have would be to create new columns where each row would also have the State and Region information, such as:
hi1 <- data.frame(reg_desc = c("1-Northeast Region", "1-Northeast Region",
"1-Northeast Region", "1-Northeast Region", "1-Northeast Region"
), state = c("04-Connecticut", "05-Maine", "04-Connecticut",
"05-Maine", NA), trigger_city = c("14860-Bridgeport-Stamford-Norwalk",
"12620-Bangor", NA, NA, NA), Category = c("M", "M", "S", "S",
"R"), Cred_Fac = c(0, 0, 0.317804971641414, 0, 1), Mean = c(50323.3311111111,
48944.4266666667, 44220.8220792079, 43724.1495, 50492.0654351396),
State_Cred_Fac = c(0.317805,0.000000,NA,NA,NA),Mean_State = c(44220.82,43724.15,NA,NA,NA),
Reg_Cred_Fac = c(1.000000,1.000000,1.000000,1.000000,NA),
Mean_Region = c(50492.07,50492.07,50492.07,50492.07,NA))
afterwards,I could just do
new <- hi1 %>% mutate(New_Mean = ifelse(Cred_Fac == 1,Mean,ifelse(Cred_Fac < 1 & (State_Cred_Fac == 1 & !is.na(State_Cred_Fac)), Mean*Cred_Fac+State_Mean*(1-Cred_Fac),
ifelse(Cred_Fac < 1 & Reg_Cred_Fac == 1, Mean*Cred_Fac+Mean_Region*(1-Cred_Fac),NA))))
which gives me the final result I'm looking for, but I was wondering if there was any way I could do this row-wise without having to insert the new columns? I did this on a small example, so I'm not sure how I would be able to create the columns State_Mean, State_Cred_Fac, etc on a larger scale without manually inserting the values. Any suggestions and advice would be greatly appreciated!
Upvotes: 0
Views: 145
Reputation: 66490
Here's an approach that joins the appropriate State and Regional figures to help with each row's calculation.
library(tidyverse)
ex_augmented <- ex %>%
left_join(ex %>% filter(Category == "R") %>%
select(reg_desc, R_Cred_Fac = Cred_Fac, R_Mean = Mean)) %>%
left_join(ex %>% filter(Category == "S") %>%
select(state, S_Cred_Fac = Cred_Fac, S_Mean = Mean)) %>%
mutate(M_Cred = if_else(Category == "M", Cred_Fac, 0),
M_Mean = if_else(Category == "M", Mean, 0),
across(everything(), ~replace_na(.x, 0))) %>%
select(-Cred_Fac, -Mean)
#> ex_augmented
# reg_desc state trigger_city Category R_Cred_Fac R_Mean S_Cred_Fac S_Mean M_Cred M_Mean
#1 1-Northeast Region 04-Connecticut 14860-Bridgeport-Stamford-Norwalk M 1 50492.07 0.317805 44220.82 0 50323.33
#2 1-Northeast Region 05-Maine 12620-Bangor M 1 50492.07 0.000000 43724.15 0 48944.43
#3 1-Northeast Region 04-Connecticut 0 S 1 50492.07 0.317805 44220.82 0 0.00
#4 1-Northeast Region 05-Maine 0 S 1 50492.07 0.000000 43724.15 0 0.00
#5 1-Northeast Region 0 0 R 1 50492.07 0.000000 0.00 0 0.00
Now we can do what seems to me a simpler calculation. My result is different for the first row, since I used what seemed like a simpler logic to use partial credibility figures: discount by the credibility, and add the impact of higher-scale figures until total credibility = 1. So for row 1, with 0% Metro cred, I took 32% of State + 68% Region, rather than 100% Region. That seemed more consistent to me, but maybe I need to digest the logic you were using more to understand.
ex_augmented %>%
# State and Region credibility add successively to Metro to get to 1
mutate(S_Cred = pmax(0, S_Cred_Fac - M_Cred),
R_Cred = pmax(0, R_Cred_Fac - S_Cred_Fac - M_Cred)) %>%
# new_mean is wtd avg of all terms
mutate(new_mean = M_Mean * M_Cred + S_Mean * S_Cred + R_Mean * R_Cred) %>%
# sorting the columns for nicer reading
select(reg_desc:Category, M_Mean, S_Mean, R_Mean, M_Cred, S_Cred, R_Cred, new_mean)
reg_desc state trigger_city Category M_Mean S_Mean R_Mean M_Cred S_Cred R_Cred new_mean
1 1-Northeast Region 04-Connecticut 14860-Bridgeport-Stamford-Norwalk M 50323.33 44220.82 50492.07 0 0.317805 0.682195 48499.03
2 1-Northeast Region 05-Maine 12620-Bangor M 48944.43 43724.15 50492.07 0 0.000000 1.000000 50492.07
3 1-Northeast Region 04-Connecticut 0 S 0.00 44220.82 50492.07 0 0.317805 0.682195 48499.03
4 1-Northeast Region 05-Maine 0 S 0.00 43724.15 50492.07 0 0.000000 1.000000 50492.07
5 1-Northeast Region 0 0 R 0.00 0.00 50492.07 0 0.000000 1.000000 50492.07
Upvotes: 1