bricevk
bricevk

Reputation: 207

Creating a nested loop to execute a function on a list of variables from a list of data frames

I have three data frames, that could be stored as such

dfs <- list("ibu_819", "ibu_1121", "ibu_1022")

and a list of variables for which I need to complete a very simple operation: changing all the 2s to 0s (an incorrectly coded dummy variable)

vars <- list("bene_lastyear", "bene_nextyear", "child_death","citychild")

I have done so successfully using this clunky code

ibu_819 <- ibu_819 %>%
  mutate(bene_lastyear = if_else(bene_lastyear == 2, 0,1),
         bene_nextyear = if_else(bene_nextyear == 2, 0,1),
         child_death = if_else(child_death == 2, 0,1),
         citychild = if_else(citychild == 2, 0,1))

ibu_1121 <- ibu_1121 %>%
  mutate(bene_lastyear = if_else(bene_lastyear == 2, 0,1),
         bene_nextyear = if_else(bene_nextyear == 2, 0,1),
         child_death = if_else(child_death == 2, 0,1),
         citychild = if_else(citychild == 2, 0,1))

ibu_1022 <- ibu_1022 %>%
  mutate(bene_lastyear = if_else(bene_lastyear == 2, 0,1),
         bene_nextyear = if_else(bene_nextyear == 2, 0,1),
         child_death = if_else(child_death == 2, 0,1),
         citychild = if_else(citychild == 2, 0,1))

I have always performed my data cleaning in stata, where I would certainly want to take care of this task in one tidy loop, but I can't figure out how to do so in R. I'd love if someone could show me how to do exactly what I have done by looping over the two lists provided above, and only writing the actual mutate function once.

(also open to suggestions for a prettier solution than my if_else strategy. I'm sure there's a more fluid way to change my 2s to 0s, but I just did what I did because I knew how.)

ALSO, I should note that I do not want to append my data frames just yet, so please don't solve this by combining the data frames and then looping through the variables.

Upvotes: 0

Views: 431

Answers (2)

George Savva
George Savva

Reputation: 5336

Keeping data frames names as a list of strings is a bit odd, having a list of the dataframes themselves would be better. That is:

dfs <- list(ibu_819, ibu_11211, ibu_1022)

Then you could use:

for(d in dfs){
  for(v in vars) d[[v]][d[[v]]==2] <- 0
}

Note only the copies inside the list would be updated. To copy them back into the main environment you'd need to use a named list and then the list2env function. So the whole thing would be:

dfs <- list("iby_819"=ibu_819, "ibu_11211"=ibu_11211, "ibu_1022"=ibu_1022)
for(d in dfs){
  for(v in vars) d[[v]][d[[v]]==2] <- 0
}
list2env(dfs, globalenv())

If you want to do it using the list of dataframe names, (ie dfs is the list of strings you currently have) then I think you have to make a copy of the data frame inside the loop, then assign it back when you're done. This isn't good practice though.

for (d in dfs){
 df <- get(d)
 for(v in vars) df[[v]][df[[v]]==2] <- 0
 assign(d, df)
}

Finally, that pattern:

x[x==2] <- 0

Is how I would replace all the 2s with 0s in a vector. Does the same as replace x=0 if x==2 in Stata.

Upvotes: 0

Elia
Elia

Reputation: 2584

Another option using Map

#create dummy data
l <- list(df1 <- data.frame(a=1:10),
df2 <- data.frame(b=1:10),
df3 <- data.frame(c=1:10)
)
var <- c("a","b","c")
#function to replace old values with new one
myfun <- function(df,var){
  df[df[[var]]==2,var] <- 0
  return(df)
}
res <- Map(myfun,l,var)

Here the original list of data.frame is preserved and all values =2 are update to 0 in the new list of data.frame, called res

Upvotes: 1

Related Questions