Reputation: 1677
I'm trying to apply a very complex function to a list of more than 50 Data Frames. Let's use a very simple function to lowercase names and just 3 data frames for the sake of clarity, but my general approach is coded below
[EDITED NAMES]
# Data Sample. Every column name is different accross Data Frames
quality <- data.frame(FIRST=c(1,5,3,3,2), SECOND=c(3,6,1,5,5))
thickness <- data.frame(THIRD=c(6,0,9,1,2), FOURTH=c(2,7,2,2,1))
distance <- data.frame(ONEMORE=c(0,0,1,5,1), ANOTHER=c(4,1,9,2,3))
# list of dataframes
dfs <- list(quality, thickness, distance)
# a very simple function (just for testing)
# actually a very complex one is used on real data
BetterNames <- function(x) {
names(x) <- tolower(names(x))
x
}
# apply function to data frame list
dfs <- lapply(dfs, BetterNames)
# I know the expected R behaviour is to modify a copy of the object,
# instead of the original object itself. So if you get the names
# you get the original version, not the needed one
names(quality)
[1] "FIRST" "SECOND"
is there any way of using any function inside a loop or "apply" in place for a huge amount of data frames?
As a result we must get the modified one replacing the original one for every data frame in the list (big list)
I know there's a trick using Data Table, but I wonder if using base R is that possible.
Expected Results:
names(quality)
[1] "first" "second"
[EDITED] Pointed out to this answer: Rename columns in multiple dataframes, R
But not working. You can't use a vector of string names in my case because my new names are not a fixed list of strings.[EDITED DATA]
for(df in dfs) {
df.tmp <- get(df)
names(df.tmp) <- BetterNames(df)
assign(df, df.tmp)
}
> names(quality)
[1] "quality" NA
Thanks
Upvotes: 2
Views: 2320
Reputation: 1312
i'd use a simple yet effective parse & eval approach.
Let's use a for loop to compose a command that suited your needs:
for(df in dfs) {
command <- paste0("names(",df,") <- BetterNames(",df,")")
# print(command)
eval(parse(text=command))
}
names(quality)
[1] "first" "second"
names(thickness)
[1] "third" "fourth"
names(distance)
[1] "onemore" "another"
Upvotes: 2
Reputation: 145775
You already have the best case scenario:
Let's add some names to your list:
names(dfs) <- c("quality", "thickness", "distance")
dfs <- lapply(dfs, BetterNames)
dfs[["quality"]]
# first second
# 1 1 3
# 2 5 6
# 3 3 1
# 4 3 5
# 5 2 5
This works great. And all your data is in a list, so if there are other things you want to do to all your data frames it is very easy.
If you are done treating these data frames similarly and really want them back in the global environment to work with individually, you can do it with
list2env(dfs, envir = .GlobalEnv)
I would recommend keeping them in a list though---in most cases if you have 50 data frames you are working with, in a list
it is easy to use lapply
or for
loops to use them, but as individual objects you will be copy/pasting code and making mistakes.
I would consider even starting with 50 data frames in your workspace a problem - see How do I make a list of data frames? for recommendations on finding an upstream fix: going straight to a list from the start.
Upvotes: 2
Reputation: 3062
This is for sure not optimal and I hope something better comes up but here it goes:
BetterNames <- function(x, y) {
names(x) <- tolower(names(x))
assign(y, x, envir = .GlobalEnv)
}
dfs <- list(quality, thickness, distance)
dfs2 <- c("quality", "thickness", "distance")
mapply(BetterNames, dfs, dfs2)
> names(quality)
[1] "first" "second"
Upvotes: 0