user16881393
user16881393

Reputation: 33

Looping same function over data frames in R

I'm new to R. I want to recode the same variable through multiple data frames. But I keep getting errors. Please see the following example of 3 small data frames. I want to create a new variable in each of the 3 dfs called Q2_nom that is recoded as "1" or "0" if Q2 is greater or less than the mean(Q2). Please see my code below.

df1:

Q1 <- c('ABC','DEF','GHI', 'DEF','JKL','XYZ')
Q2 <- c(21000, 23400, 26800, 26000, 20400, 30800)
df1 <- data.frame(Q1, Q2)

df2:

Q1 <- c('DEF','JKL','XYZ', 'ABC', 'MNO', 'PQR')
Q2 <- c(30100, 20200, 15800, 21000, 23400, 26800)
df2 <- data.frame(Q1, Q2)

df3:

Q1 <- c('ABC','DEF','GHI', 'XYZ', 'MNO', 'PQR')
Q2 <- c(17800, 23060, 13080, 27000, 22400, 26500)
df3 <- data.frame(Q1, Q2)

a <- c("Q1", "Q2", "Q3")

for (i in a) {
  newname <- paste(i)
  newname$Q2_mean_nom <- ifelse(newmame$Q2 > mean(newname$Q2, na.rm = TRUE), "1", "0")
}

I noticed that in doing the above, newname is not a df and so the mean won't run. Is there a way to make the loop recognise newname as a df?

I tried using a list but it didn't work either.

newlist <- c(df1, df2, df3)

for (i in 1:length(newlist)) {
  newlist[[i]]$Q2_mean_nom <- ifelse(newlist[[i]]$Q2 > mean(newlist[[i]]$Q2, na.rm = TRUE),
"1", "0")
 }

Please help. Thank you very much!

Upvotes: 3

Views: 75

Answers (3)

Ronak Shah
Ronak Shah

Reputation: 388982

Get the data in a named list using mget and paste0. Use lapply to create a new column in each dataframe, use list2env to have those changes reflected in individual dataframes.

In base R, you can do

list_data <- mget(paste0('df', 1:3))
list_data <- lapply(list_data, function(x) 
            transform(x, Q2_mean_nom = as.integer(Q2 > mean(Q2, na.rm = TRUE))))

list2env(list_data, .GlobalEnv)

df1
#   Q1    Q2 Q2_mean_nom
#1 ABC 21000           0
#2 DEF 23400           0
#3 GHI 26800           1
#4 DEF 26000           1
#5 JKL 20400           0
#6 XYZ 30800           1

df2
#   Q1    Q2 Q2_mean_nom
#1 DEF 30100           1
#2 JKL 20200           0
#3 XYZ 15800           0
#4 ABC 21000           0
#5 MNO 23400           1
#6 PQR 26800           1

df3
#   Q1    Q2 Q2_mean_nom
#1 ABC 17800           0
#2 DEF 23060           1
#3 GHI 13080           0
#4 XYZ 27000           1
#5 MNO 22400           1
#6 PQR 26500           1

Upvotes: 1

user14692575
user14692575

Reputation:

Using your nice base R approach with slight modification it works:

# list instead of vector:
newlist <- list(df1, df2, df3)
# someone told me once `seq_along`is more stable
for (i in seq_along(newlist)) {
  newlist[[i]]$Q2_mean_nom <- ifelse(newlist[[i]]$Q2 > mean(newlist[[i]]$Q2, na.rm = TRUE), "1", "0")
}

EDIT (see comments):

# df1:
Q1 <- c('ABC','DEF','GHI', 'DEF','JKL','XYZ')
Q2 <- c(21000, 23400, 26800, 26000, 20400, 30800)
df1 <- data.frame(Q1, Q2)
# df2:
Q1 <- c('DEF','JKL','XYZ', 'ABC', 'MNO', 'PQR')
Q2 <- c(30100, 20200, 15800, 21000, 23400, 26800)
df2 <- data.frame(Q1, Q2)
# df3:
Q1 <- c('ABC','DEF','GHI', 'XYZ', 'MNO', 'PQR')
Q2 <- c(17800, 23060, 13080, 27000, 22400, 26500)
df3 <- data.frame(Q1, Q2)

# list instead of vector:
newlist <- list(df1, df2, df3)
# someone told me once `seq_along`is more stable
for (i in seq_along(newlist)) {
  newlist[[i]]$Q2_mean_nom <- ifelse(newlist[[i]]$Q2 > mean(newlist[[i]]$Q2, na.rm = TRUE), "1", "0")
}

newlist # displays output, see next chunk.
# output:
#> [[1]]
#>    Q1    Q2 Q2_mean_nom
#> 1 ABC 21000           0
#> 2 DEF 23400           0
#> 3 GHI 26800           1
#> 4 DEF 26000           1
#> 5 JKL 20400           0
#> 6 XYZ 30800           1
#> 
#> [[2]]
#>    Q1    Q2 Q2_mean_nom
#> 1 DEF 30100           1
#> 2 JKL 20200           0
#> 3 XYZ 15800           0
#> 4 ABC 21000           0
#> 5 MNO 23400           1
#> 6 PQR 26800           1
#> 
#> [[3]]
#>    Q1    Q2 Q2_mean_nom
#> 1 ABC 17800           0
#> 2 DEF 23060           1
#> 3 GHI 13080           0
#> 4 XYZ 27000           1
#> 5 MNO 22400           1
#> 6 PQR 26500           1

Created on 2021-09-11 by the reprex package (v2.0.1)

Upvotes: 1

TarJae
TarJae

Reputation: 78927

We could use map from purrr package:

  1. Save your df's in a list of df's
  2. then iterate with map over each df and
  3. mutate the new column with your ifelse condition
library(purrr)
library(dplyr)
list_df %>% 
    map(~mutate(., Q2_mean_nom = ifelse(Q2 > mean(Q2, na.rm = TRUE), 1, 0)))
[[1]]
   Q1    Q2 Q2_mean_nom
1 ABC 21000           0
2 DEF 23400           0
3 GHI 26800           1
4 DEF 26000           1
5 JKL 20400           0
6 XYZ 30800           1

[[2]]
   Q1    Q2 Q2_mean_nom
1 DEF 30100           1
2 JKL 20200           0
3 XYZ 15800           0
4 ABC 21000           0
5 MNO 23400           1
6 PQR 26800           1

[[3]]
   Q1    Q2 Q2_mean_nom
1 ABC 17800           0
2 DEF 23060           1
3 GHI 13080           0
4 XYZ 27000           1
5 MNO 22400           1
6 PQR 26500           1

Upvotes: 1

Related Questions