MegPD
MegPD

Reputation: 1

Combining for loop and mutate within a function for multiple data frames in R

I'm hoping I can get some insight into my problem. Basically, I want to create the same set of variables within several different data sets (they are subsets of the original dataset based on outliers, protest answers, etc.). I have tried many things, but I'm completely stumped.

Firstly I realize that using eval(parse( is against suggestions however, I am not the most efficient coder and this was working for my purposes. Anyway, my aim is basically to create a set of variables - X_3, X_4, X_5, etc. which is the year amount (x12) of a set of other variables - z1_1, z2_1, z3_1, etc. for this, I have the below code, which works when I individually put in each dataset name where 'data' is.

Edit: Data expectiations

#what it looks like now
  responseid   z3_1 z4_1 z5_1
1          1  4.720 7.08   NA
2          2  1.180   NA 1.18
3          3  1.180   NA 1.18
4          4  2.596 3.54   NA
5          5 15.340   NA   NA
6          6  2.360   NA 2.36

#what i'd like it look like:
  responseid   z3_1 z4_1 z5_1    X_3  X_4  X_5
1          1  4.720 7.08   NA  56.640 84.96    NA
2          2  1.180   NA 1.18  14.160    NA 14.16
3          3  1.180   NA 1.18  14.160    NA 14.16
4          4  2.596 3.54   NA  31.152 42.48    NA
5          5 15.340   NA   NA 184.080    NA    NA
6          6  2.360   NA 2.36  28.320    NA 28.32

#dput
#original
structure(list(responseid = c(1L, 2L, 3L, 4L, 5L, 
6L), z3_1 = c(4.72, 1.18, 1.18, 2.596, 15.34, 2.36), z4_1 = c(7.08, 
NA, NA, 3.54, NA, NA), z5_1 = c(NA, 1.18, 1.18, NA, NA, 2.36),  class = "data.frame", row.names = c(NA, 6L))

#expected
structure(list(responseid = c(1L, 2L, 3L, 4L, 5L, 
6L), z3_1 = c(4.72, 1.18, 1.18, 2.596, 15.34, 2.36), z4_1 = c(7.08, 
NA, NA, 3.54, NA, NA), z5_1 = c(NA, 1.18, 1.18, NA, NA, 2.36), 
    X_3 = c(56.64, 14.16, 14.16, 31.152, 184.08, 28.32), X_4 = c(84.96, 
    NA, NA, 42.48, NA, NA), X_5 = c(NA, 14.16, 14.16, NA, NA, 
    28.32)), class = "data.frame", row.names = c(NA, 6L))

But I'd like to do this for several datasets, hence why I'm looking to use a function to run across each df.

for (i in 3:9){
    X.varname <- paste0("X_",i)
    data <- data %>%
      mutate(
        !!X.varname := eval(parse(text=paste0("z", i, "_1")))*12
      )
  }

However, when I try to put this in a function (so I can run it over a list of data frames) nothing happens:

f.test <- function(data){
for (i in 3:9){
    X.varname <- paste0("X_",i)
    data <- data %>%
      mutate(
        !!X.varname := eval(parse(text=paste0("z", i, "_1")))*12
      )
  }
}

Does anyone have any idea why this might be? This is my first question on StackOverflow so I apologize if any of the formatting is incorrect.

Upvotes: 0

Views: 246

Answers (2)

koenniem
koenniem

Reputation: 576

Nothing happens because you assign data to the data argument to the function, which is a different data from the one in your global environment. You can use assign(data, YourAssignedValue, envir = .GlobalEnv) or use the deep assign operation <<-. Alternatively, since you want to replace the original data, simply return data like so:

f.test <- function(data){
    for (i in 3:5){
        X.varname <- paste0("X_",i)
        data <- data %>%
            mutate(!!X.varname := eval(parse(text=paste0("z", i, "_1")))*12)
    }
    data
}
data <- f.test(data)

If I may suggest another approach; You could make a nested tibble data sets using, for example, tibble::tribble and then performing mutations etc. using purrr::map.

Upvotes: 0

GGamba
GGamba

Reputation: 13680

Nothing happens because the function does not return anything. Usually you don't need to explicitly return something in R, because R automatically returns the last object. But this being a for loop, means it returns nothing.

To avoid the problem simply specify what your function should return ie. data:

f.test <- function(data){
  for (i in 3:5){
    X.varname <- paste0("X_",i)
    data <- data %>%
      mutate(
        !!X.varname := eval(parse(text=paste0("z", i, "_1")))*12
      )
  }
  return(data)
}

By the way, you are right that eval(parse(x)) is not suggested because it's dangerous and slow.

Not sure if what you showed us is really what you are trying to do, but a better way to do it would be:

library(dplyr)
library(stringr)


my_fun <- function(data) {
  data %>% 
    mutate_at(vars(starts_with("z")), .funs = list(TBC=~.*12)) %>% 
    rename_at(vars(ends_with('TBC')), .funs = ~{.x %>% str_extract('\\d') %>% str_c('X_', .)})
}

More over, as you need to do this to multiple dataframes you could use @koenniem suggestion to use purrr::map(). Put every dataframe in a list and apply the function to eache element of the list:

library(purrr)
df2 <- df %>% mutate_at(vars(-responseid), ~.+4)

df_t <- list(df, df2)

df_t %>% 
  map_df(my_fun)
#>    responseid   z3_1  z4_1 z5_1     X_3    X_4   X_5
#> 1           1  4.720  7.08   NA  56.640  84.96    NA
#> 2           2  1.180    NA 1.18  14.160     NA 14.16
#> 3           3  1.180    NA 1.18  14.160     NA 14.16
#> 4           4  2.596  3.54   NA  31.152  42.48    NA
#> 5           5 15.340    NA   NA 184.080     NA    NA
#> 6           6  2.360    NA 2.36  28.320     NA 28.32
#> 7           1  8.720 11.08   NA 104.640 132.96    NA
#> 8           2  5.180    NA 5.18  62.160     NA 62.16
#> 9           3  5.180    NA 5.18  62.160     NA 62.16
#> 10          4  6.596  7.54   NA  79.152  90.48    NA
#> 11          5 19.340    NA   NA 232.080     NA    NA
#> 12          6  6.360    NA 6.36  76.320     NA 76.32

Upvotes: 1

Related Questions