Reputation: 33
I'm new to R. I want to recode the same variable through multiple data frames. But I keep getting errors. Please see the following example of 3 small data frames. I want to create a new variable in each of the 3 dfs called Q2_nom that is recoded as "1" or "0" if Q2 is greater or less than the mean(Q2). Please see my code below.
df1:
Q1 <- c('ABC','DEF','GHI', 'DEF','JKL','XYZ')
Q2 <- c(21000, 23400, 26800, 26000, 20400, 30800)
df1 <- data.frame(Q1, Q2)
df2:
Q1 <- c('DEF','JKL','XYZ', 'ABC', 'MNO', 'PQR')
Q2 <- c(30100, 20200, 15800, 21000, 23400, 26800)
df2 <- data.frame(Q1, Q2)
df3:
Q1 <- c('ABC','DEF','GHI', 'XYZ', 'MNO', 'PQR')
Q2 <- c(17800, 23060, 13080, 27000, 22400, 26500)
df3 <- data.frame(Q1, Q2)
a <- c("Q1", "Q2", "Q3")
for (i in a) {
newname <- paste(i)
newname$Q2_mean_nom <- ifelse(newmame$Q2 > mean(newname$Q2, na.rm = TRUE), "1", "0")
}
I noticed that in doing the above, newname is not a df and so the mean won't run. Is there a way to make the loop recognise newname as a df?
I tried using a list but it didn't work either.
newlist <- c(df1, df2, df3)
for (i in 1:length(newlist)) {
newlist[[i]]$Q2_mean_nom <- ifelse(newlist[[i]]$Q2 > mean(newlist[[i]]$Q2, na.rm = TRUE),
"1", "0")
}
Please help. Thank you very much!
Upvotes: 3
Views: 75
Reputation: 388982
Get the data in a named list using mget
and paste0
. Use lapply
to create a new column in each dataframe, use list2env
to have those changes reflected in individual dataframes.
In base R, you can do
list_data <- mget(paste0('df', 1:3))
list_data <- lapply(list_data, function(x)
transform(x, Q2_mean_nom = as.integer(Q2 > mean(Q2, na.rm = TRUE))))
list2env(list_data, .GlobalEnv)
df1
# Q1 Q2 Q2_mean_nom
#1 ABC 21000 0
#2 DEF 23400 0
#3 GHI 26800 1
#4 DEF 26000 1
#5 JKL 20400 0
#6 XYZ 30800 1
df2
# Q1 Q2 Q2_mean_nom
#1 DEF 30100 1
#2 JKL 20200 0
#3 XYZ 15800 0
#4 ABC 21000 0
#5 MNO 23400 1
#6 PQR 26800 1
df3
# Q1 Q2 Q2_mean_nom
#1 ABC 17800 0
#2 DEF 23060 1
#3 GHI 13080 0
#4 XYZ 27000 1
#5 MNO 22400 1
#6 PQR 26500 1
Upvotes: 1
Reputation:
Using your nice base R approach with slight modification it works:
# list instead of vector:
newlist <- list(df1, df2, df3)
# someone told me once `seq_along`is more stable
for (i in seq_along(newlist)) {
newlist[[i]]$Q2_mean_nom <- ifelse(newlist[[i]]$Q2 > mean(newlist[[i]]$Q2, na.rm = TRUE), "1", "0")
}
EDIT (see comments):
# df1:
Q1 <- c('ABC','DEF','GHI', 'DEF','JKL','XYZ')
Q2 <- c(21000, 23400, 26800, 26000, 20400, 30800)
df1 <- data.frame(Q1, Q2)
# df2:
Q1 <- c('DEF','JKL','XYZ', 'ABC', 'MNO', 'PQR')
Q2 <- c(30100, 20200, 15800, 21000, 23400, 26800)
df2 <- data.frame(Q1, Q2)
# df3:
Q1 <- c('ABC','DEF','GHI', 'XYZ', 'MNO', 'PQR')
Q2 <- c(17800, 23060, 13080, 27000, 22400, 26500)
df3 <- data.frame(Q1, Q2)
# list instead of vector:
newlist <- list(df1, df2, df3)
# someone told me once `seq_along`is more stable
for (i in seq_along(newlist)) {
newlist[[i]]$Q2_mean_nom <- ifelse(newlist[[i]]$Q2 > mean(newlist[[i]]$Q2, na.rm = TRUE), "1", "0")
}
newlist # displays output, see next chunk.
# output:
#> [[1]]
#> Q1 Q2 Q2_mean_nom
#> 1 ABC 21000 0
#> 2 DEF 23400 0
#> 3 GHI 26800 1
#> 4 DEF 26000 1
#> 5 JKL 20400 0
#> 6 XYZ 30800 1
#>
#> [[2]]
#> Q1 Q2 Q2_mean_nom
#> 1 DEF 30100 1
#> 2 JKL 20200 0
#> 3 XYZ 15800 0
#> 4 ABC 21000 0
#> 5 MNO 23400 1
#> 6 PQR 26800 1
#>
#> [[3]]
#> Q1 Q2 Q2_mean_nom
#> 1 ABC 17800 0
#> 2 DEF 23060 1
#> 3 GHI 13080 0
#> 4 XYZ 27000 1
#> 5 MNO 22400 1
#> 6 PQR 26500 1
Created on 2021-09-11 by the reprex package (v2.0.1)
Upvotes: 1
Reputation: 78927
We could use map
from purrr
package:
map
over each df andmutate
the new column with your ifelse
conditionlibrary(purrr)
library(dplyr)
list_df %>%
map(~mutate(., Q2_mean_nom = ifelse(Q2 > mean(Q2, na.rm = TRUE), 1, 0)))
[[1]]
Q1 Q2 Q2_mean_nom
1 ABC 21000 0
2 DEF 23400 0
3 GHI 26800 1
4 DEF 26000 1
5 JKL 20400 0
6 XYZ 30800 1
[[2]]
Q1 Q2 Q2_mean_nom
1 DEF 30100 1
2 JKL 20200 0
3 XYZ 15800 0
4 ABC 21000 0
5 MNO 23400 1
6 PQR 26800 1
[[3]]
Q1 Q2 Q2_mean_nom
1 ABC 17800 0
2 DEF 23060 1
3 GHI 13080 0
4 XYZ 27000 1
5 MNO 22400 1
6 PQR 26500 1
Upvotes: 1