Reputation: 569
I have a list of dataframes in R. What I need to do is apply a function to each dataframe, in this case removing special characters, and have returned a list of dataframes.
Using lapply
and as.data.frame
the following works fine and delivers exactly what I need:
my_df =data.frame(names = seq(1,10), chars = c("abcabc!!", "abcabc234234!!"))
my_list = list(my_df, my_df, my_df)
#str(my_list)
List of 3
$ :'data.frame': 10 obs. of 2 variables: ...
new_list <- lapply(my_list, function(y) as.data.frame(lapply(y, function(x) gsub("[^[:alnum:][:space:]']", "", x))))
# str(new_list)
List of 3
$ :'data.frame': 10 obs. of 2 variables:
..$ names: Factor w/ 10 levels "1","10","2","3",..: 1 3 4 5 6 7 8 9 10 2
..$ chars: Factor w/ 2 levels "abcabc","abcabc234234": 1 2 1 2 1 2 1 2 1 2
$ :'data.frame': 10 obs. of 2 variables:
..$ names: Factor w/ 10 levels "1","10","2","3",..: 1 3 4 5 6 7 8 9 10 2
..$ chars: Factor w/ 2 levels "abcabc","abcabc234234": 1 2 1 2 1 2 1 2 1 2
$ :'data.frame': 10 obs. of 2 variables:
..$ names: Factor w/ 10 levels "1","10","2","3",..: 1 3 4 5 6 7 8 9 10 2
..$ chars: Factor w/ 2 levels "abcabc","abcabc234234": 1 2 1 2 1 2 1 2 1 2
But I am wondering if there is a more efficient way that doesn't require nested lapply
. Perhaps a different apply-family function that returns the elements as a dataframe?
Upvotes: 4
Views: 244
Reputation: 1544
While @akrun is right that your second lapply
call is useless in this example, I think it does not solve the general case where many columns might be relevant, and it is unknown which might be.
What is inefficient here is the conversion back with as.data.frame
, not the inner lapply
call. The lapply
call itself is almost just as fast as if you would apply the function to a single vector or a matrix of the same size.
If you really want to be more time-efficient here, I would suggest using data.table
. I've made the example a bit larger so we can time it.
library(data.table)
f <- function(x) gsub("[^[:alnum:][:space:]']", "", x)
my_df <- as.data.frame(matrix(paste0(sample(c(letters,'!'), size=1000000, replace=T),
sample(c(letters,'!'), size=1000000, replace=T)),
ncol=250), stringsAsFactors = FALSE)
my_list = list(my_df, my_df, my_df)
system.time(lapply(my_list, function(y) as.data.frame(lapply(y, f))))
# 2.256 seconds
my_dt <- as.data.table(my_df)
my_list2 = list(my_dt, my_dt, my_dt)
system.time(lapply(my_list2, function(y) y[,lapply(.SD,f)]))
# 1.180 seconds
Upvotes: 1
Reputation: 887851
We don't need a nested lapply
, just a single lapply
with transform
does it
lapply(my_list, transform, chars = gsub("[^[:alnum:][:space:]']", "", chars))
The pattern can be made compact to "[^[[:alnum:] ']"
Upvotes: 4