Reputation: 983
I have a dataframe that covers multiple years like this:
library(dplyr)
df <- tibble(good_2018 = 0,
bad_2018 = 1,
id_2018 = 0,
good_2019 = 3,
bad_2019 = 1,
id_2019 = 1)
I want to derive new columns based on the data for each year t (e.g., 2018 and 2019). If the id variable for year t does not equal 0, then the outcome should be the percentage identified as good for year t. The resulting dataset should look like this:
df %>%
mutate(pct_good_2018 = if_else(id_2018 == 0, 0,
100*good_2018/(good_2018 + bad_2018)),
pct_good_2019 = if_else(id_2019 == 0, 0,
100*good_2019/(good_2019 + bad_2019)))
#> # A tibble: 1 × 8
#> good_2018 bad_2018 id_2018 good_2019 bad_2019 id_2019 pct_good_2018 pct_good…¹
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 0 1 0 3 1 1 0 75
#> # … with abbreviated variable name ¹pct_good_2019
Instead of generating the pct_good columns for each year individually, would like to use the purrr
package, but I cannot figure out how to do it. I believe it requires rlang
, but the various configurations of !=
and {{}}
that I try yield errors that I do not understand.
Upvotes: 0
Views: 84
Reputation: 24742
You can try this approach using data.table
df
to data.table, and make a vector of yrs
library(data.table)
setDT(df)
yrs = c("2018","2019")
f <- function(d) fifelse(d[3]==0,0,d[1]*100/(d[1]+d[2]))
df[, (paste0("pct_good_",yrs)):=lapply(yrs, \(y) {.SD[,f(t(.SD)),.SDcols = patterns(paste0("_",y,"$"))]}), by=.I]
Output:
good_2018 bad_2018 id_2018 good_2019 bad_2019 id_2019 pct_good_2018 pct_good_2019
1: 0 1 0 3 1 1 0 75
However, as pointed out the main comments of the OP, you are generally better off with long formatted data.
Upvotes: 1
Reputation: 9868
We can use glue
to create dynamic column names to use in a custom-function:
library(purrr)
library(glue)
pct_good <-function(df, year) {
if_else(pull(df, glue('id_{year}')) == 0,
0,
100 * pull(df, glue('good_{year}')) / (pull(df, glue('good_{year}')) + pull(df, glue('bad_{year}'))))
}
Then we can use purrr:map_dfc
to create a dataframe column for every iteration:
df %>%
mutate(map_dfc(c(2018, 2019), ~pct_good(df, .x))
# A tibble: 1 × 8
good_2018 bad_2018 id_2018 good_2019 bad_2019 id_2019 ...1 ...2
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 0 1 0 3 1 1 0 75
Upvotes: 2