Thomas Philips
Thomas Philips

Reputation: 1089

Using dynamic names with dplyr

I have a data frame in which I would I would like to compute some extra column as a function of the existing columns, but want to specify both each new column name and the function dynamically. I have a vector of column names that are already in the dataframe df_daily:

DAILY_QUESTIONS  <- c("Q1_Daily",  "Q2_Daily",   "Q3_Daily",  "Q4_Daily",  "Q5_Daily")

The rows of the dataframe have responses to each question from each user each time they answer the questionnaire, as well as a column with the number of days since the user first answered the questionnaire (i.e. Days_From_First_Use = 0 on the very first use, = 1 if it is used the next day etc.). I want to average the responses to these questions by Days_From_First_Use . I start by by grouping my dataframe by Days_From_First_Use:

df_test <- df_daily %>%
             group_by(Days_From_First_Use)

and then try averaging the responses in a loop as follows:

for(i in 1:5){
   df_test <- df_test %>%
                 mutate(!! paste0('Avg_Score_', DAILY_QUESTIONS[i]) := 
                 paste0('mean(', DAILY_QUESTIONS[i], ')'))
}

Unfortunately, while my new variable names are correct ("Avg_Score_Q1_Daily", "Avg_Score_Q2_Daily", "Avg_Score_Q3_Daily", "Avg_Score_Q4_Daily", "Avg_Score_Q5_Daily"), my answers are not: every row in my data frame has a string such as "mean(Q1_Daily)" in the relevant column .

So I'm clearly doing something wrong - what do I need to do fix this and get the average score across all users on each day?

Sincerely and with many thanks in advance

Thomas Philips

Upvotes: 0

Views: 377

Answers (1)

Andrew Brown
Andrew Brown

Reputation: 1065

I took a somewhat different approach, using summarize(across(...)) after group_by(Days_From_First_Use) I achieve the dynamic names by using rename_with and a custom function that replaces (starts with)"Q" with "Avg_Score_Q"

library(dplyr, warn.conflicts = FALSE)

# fake data -- 30 normalized "responses" from 0 to 2 days from first use to 5 questions
DAILY_QUESTIONS  <- c("Q1_Daily",  "Q2_Daily",   "Q3_Daily",  "Q4_Daily",  "Q5_Daily")
df_daily <- as.data.frame(do.call('cbind', lapply(1:5, function(i) rnorm(30, i))))
colnames(df_daily) <- DAILY_QUESTIONS
df_daily$Days_From_First_Use <- floor(runif(30, 0, 3))

df_test <- df_daily %>%
  group_by(Days_From_First_Use) %>%
  summarize(across(.fns = mean)) %>%
  rename_with(.fn  = function(x) gsub("^Q","Avg_Score_Q",x))
#> `summarise()` ungrouping output (override with `.groups` argument)

df_test  
#> # A tibble: 3 x 6
#>   Days_From_First… Avg_Score_Q1_Da… Avg_Score_Q2_Da… Avg_Score_Q3_Da…
#>              <dbl>            <dbl>            <dbl>            <dbl>
#> 1                0            1.26              1.75             3.02
#> 2                1            0.966             2.14             3.48
#> 3                2            1.08              2.45             3.01
#> # … with 2 more variables: Avg_Score_Q4_Daily <dbl>, Avg_Score_Q5_Daily <dbl>

Created on 2020-12-06 by the reprex package (v0.3.0)

Upvotes: 1

Related Questions