Reputation: 15
I have 8 datasets and I want to apply a function to convert any number less than 5 to NA on 3 columns(var1,var2,var3) of each dataset. How can I write a function to do it effectively and faster ? I went through lots of such questions on Stack overflow but I didnt find any answer where specific columns were used. I have written the function to replace but cant figure out how to apply to all the datasets.
Input:
Data1
variable1 variable2 variable3 variable4
10 36 56 99
15 3 2 56
4 24 1 1
Expected output:
variable1 variable2 variable3 variable4
10 36 56 99
15 NA NA 56
NA 24 NA 1
Perform the same thing for 7 more datasets.
Till now I have stored the needed variables and datasets in two different list.
var1=enquo(variable1)
var2=enquo(variable2)
var3=enquo(variable3)
Total=3
listofdfs=list()
listofdfs_1=list()
for(i in 1:8) {
df=sym((paste0("Data",i)))
listofdfs[[i]]=df
}
for(e in 1:Ttoal) {
listofdfs[[e]]= eval(sym(paste0("var",e)))
}
The selected columns will go through this function:
temp_1=function(x,h) {
h=enquo(h)
for(e in 1:Total) {
if(substr(eval(sym(paste0("var",e))),1,3)=="var") {
y= x %>% mutate_at(vars(!!h), ~ replace(., which(.<=5),NA))
return(y)
}
}
}
I was expecting something :
lapply(for each dataset's selected columns,temp_1)
Upvotes: 0
Views: 653
Reputation: 146164
Here's a simple approach that should work:
cols_to_edit = paste0("var", 1:3)
result_list = lapply(list_of_dfs, function(x) {
x[cols_to_edit][x[cols_to_edit] < 5] = NA
return(x)
})
I assume your starting data is in a list called list_of_dfs
, that the names of columns to edit are the same in all data frames, and that you can construct a character vector cols_to_edit
with those names.
Upvotes: 1
Reputation: 76663
Here is a solution to the problem in the question.
First of all, create a test data set.
createData <- function(Total = 3){
numcols <- Total + 1
set.seed(1234)
for(i in 1:8){
tmp <- replicate(numcols, sample(10, 20, TRUE))
tmp <- as.data.frame(tmp)
names(tmp) <- paste0("var", seq_len(numcols))
assign(paste0("Data", i), tmp, envir = .GlobalEnv)
}
}
createData()
Now, the data transformation.
This is much easier if the many dataframes are in a "list"
.
df_list <- mget(ls(pattern = "^Data"))
I will present solutions, a base R solution and a tidyverse
one. Note that both solutions will use function temp_1
, written in base R only.
library(tidyverse)
temp_1 <- function(x, h){
f <- function(v){
is.na(v) <- v <= 5
v
}
x[h] <- lapply(x[h], f)
x
}
h <- grep("var[123]", names(df_list[[1]]), value = TRUE)
df_list1 <- lapply(df_list, temp_1, h)
df_list2 <- df_list %>% map(temp_1, h)
identical(df_list1, df_list2)
#[1] TRUE
Upvotes: 0