Reputation: 1
I'm hoping I can get some insight into my problem. Basically, I want to create the same set of variables within several different data sets (they are subsets of the original dataset based on outliers, protest answers, etc.). I have tried many things, but I'm completely stumped.
Firstly I realize that using eval(parse( is against suggestions however, I am not the most efficient coder and this was working for my purposes. Anyway, my aim is basically to create a set of variables - X_3, X_4, X_5, etc. which is the year amount (x12) of a set of other variables - z1_1, z2_1, z3_1, etc. for this, I have the below code, which works when I individually put in each dataset name where 'data' is.
Edit: Data expectiations
#what it looks like now
responseid z3_1 z4_1 z5_1
1 1 4.720 7.08 NA
2 2 1.180 NA 1.18
3 3 1.180 NA 1.18
4 4 2.596 3.54 NA
5 5 15.340 NA NA
6 6 2.360 NA 2.36
#what i'd like it look like:
responseid z3_1 z4_1 z5_1 X_3 X_4 X_5
1 1 4.720 7.08 NA 56.640 84.96 NA
2 2 1.180 NA 1.18 14.160 NA 14.16
3 3 1.180 NA 1.18 14.160 NA 14.16
4 4 2.596 3.54 NA 31.152 42.48 NA
5 5 15.340 NA NA 184.080 NA NA
6 6 2.360 NA 2.36 28.320 NA 28.32
#dput
#original
structure(list(responseid = c(1L, 2L, 3L, 4L, 5L,
6L), z3_1 = c(4.72, 1.18, 1.18, 2.596, 15.34, 2.36), z4_1 = c(7.08,
NA, NA, 3.54, NA, NA), z5_1 = c(NA, 1.18, 1.18, NA, NA, 2.36), class = "data.frame", row.names = c(NA, 6L))
#expected
structure(list(responseid = c(1L, 2L, 3L, 4L, 5L,
6L), z3_1 = c(4.72, 1.18, 1.18, 2.596, 15.34, 2.36), z4_1 = c(7.08,
NA, NA, 3.54, NA, NA), z5_1 = c(NA, 1.18, 1.18, NA, NA, 2.36),
X_3 = c(56.64, 14.16, 14.16, 31.152, 184.08, 28.32), X_4 = c(84.96,
NA, NA, 42.48, NA, NA), X_5 = c(NA, 14.16, 14.16, NA, NA,
28.32)), class = "data.frame", row.names = c(NA, 6L))
But I'd like to do this for several datasets, hence why I'm looking to use a function to run across each df.
for (i in 3:9){
X.varname <- paste0("X_",i)
data <- data %>%
mutate(
!!X.varname := eval(parse(text=paste0("z", i, "_1")))*12
)
}
However, when I try to put this in a function (so I can run it over a list of data frames) nothing happens:
f.test <- function(data){
for (i in 3:9){
X.varname <- paste0("X_",i)
data <- data %>%
mutate(
!!X.varname := eval(parse(text=paste0("z", i, "_1")))*12
)
}
}
Does anyone have any idea why this might be? This is my first question on StackOverflow so I apologize if any of the formatting is incorrect.
Upvotes: 0
Views: 246
Reputation: 576
Nothing happens because you assign data
to the data
argument to the function, which is a different data
from the one in your global environment. You can use assign(data, YourAssignedValue, envir = .GlobalEnv)
or use the deep assign operation <<-
. Alternatively, since you want to replace the original data, simply return data
like so:
f.test <- function(data){
for (i in 3:5){
X.varname <- paste0("X_",i)
data <- data %>%
mutate(!!X.varname := eval(parse(text=paste0("z", i, "_1")))*12)
}
data
}
data <- f.test(data)
If I may suggest another approach; You could make a nested tibble data sets using, for example, tibble::tribble
and then performing mutations etc. using purrr::map
.
Upvotes: 0
Reputation: 13680
Nothing happens because the function does not return anything. Usually you don't need to explicitly return
something in R, because R automatically returns the last object. But this being a for loop, means it returns nothing.
To avoid the problem simply specify what your function should return ie. data
:
f.test <- function(data){
for (i in 3:5){
X.varname <- paste0("X_",i)
data <- data %>%
mutate(
!!X.varname := eval(parse(text=paste0("z", i, "_1")))*12
)
}
return(data)
}
By the way, you are right that eval(parse(x))
is not suggested because it's dangerous and slow.
Not sure if what you showed us is really what you are trying to do, but a better way to do it would be:
library(dplyr)
library(stringr)
my_fun <- function(data) {
data %>%
mutate_at(vars(starts_with("z")), .funs = list(TBC=~.*12)) %>%
rename_at(vars(ends_with('TBC')), .funs = ~{.x %>% str_extract('\\d') %>% str_c('X_', .)})
}
More over, as you need to do this to multiple dataframes you could use @koenniem suggestion to use purrr::map()
. Put every dataframe in a list and apply the function to eache element of the list:
library(purrr)
df2 <- df %>% mutate_at(vars(-responseid), ~.+4)
df_t <- list(df, df2)
df_t %>%
map_df(my_fun)
#> responseid z3_1 z4_1 z5_1 X_3 X_4 X_5
#> 1 1 4.720 7.08 NA 56.640 84.96 NA
#> 2 2 1.180 NA 1.18 14.160 NA 14.16
#> 3 3 1.180 NA 1.18 14.160 NA 14.16
#> 4 4 2.596 3.54 NA 31.152 42.48 NA
#> 5 5 15.340 NA NA 184.080 NA NA
#> 6 6 2.360 NA 2.36 28.320 NA 28.32
#> 7 1 8.720 11.08 NA 104.640 132.96 NA
#> 8 2 5.180 NA 5.18 62.160 NA 62.16
#> 9 3 5.180 NA 5.18 62.160 NA 62.16
#> 10 4 6.596 7.54 NA 79.152 90.48 NA
#> 11 5 19.340 NA NA 232.080 NA NA
#> 12 6 6.360 NA 6.36 76.320 NA 76.32
Upvotes: 1