mykonos
mykonos

Reputation: 73

rbind all data frames with common names based on list using lapply

I have several data frames named as such:

orange_ABC
orange_BCD
apple_ABC
apple_BCD
grape_ABC
grape_BCD

I need to rbind those that have the first part of their name in common (orange, apple, grape), and name the new data frames as such. I'm accessing the names from a list of data frames names(fruitlist) (from which I made the aforementioned data frames) and have tried using lapply with function(x) with no luck. I'm somewhat new to R, so think I'm making a simple mistake when it comes to dynamically naming the new data frame...

lapply(names(fruitlist),
       function(x){
         frame_nm <- toString((names(fruitlist[x])))
         frame_nm <- do.call(rbind, mget(ls(pattern=paste0((names(splitlist[x])),"*"))))
})

I've tried the standalone line on one type of "fruit" and it seems to work:

test_DF <- do.call(rbind, mget(ls(pattern="apple*")))

EDIT: I realize I forgot to mention that the example list of 6 data frames were created dynamically, so I can't simply generate a list of them. However, I do have a list of the "fruits", and all possible the ends of the new data frame names are known ("_ABC" and "_BCD").

Upvotes: 1

Views: 1716

Answers (3)

Luke C
Luke C

Reputation: 10301

If your fruitlist is a named list of data frames, maybe this will suit.

First, get the like names into their own list:

fruit.groups <- split(names(fruitlist), 
                      sapply(strsplit(names(fruitlist), split = "_"), "[[", 1))

> fruit.groups
$apple
[1] "apple_ABC" "apple_BCD"

$grape
[1] "grape_ABC" "grape_BCD"

$orange
[1] "orange_ABC" "orange_BCD"

Then, use lapply to rbind by group:

fdf <- lapply(fruit.groups, function(x){
  out <- do.call(rbind, fruitlist[x])
  out$from <- gsub("(\\..*)", "", rownames(out))
  rownames(out) <- NULL
  return(out)
})

> fdf$apple
  a  b      from
1 1 11 apple_ABC
2 2 12 apple_ABC
3 3 13 apple_ABC
4 4 14 apple_ABC
5 1 11 apple_BCD
6 2 12 apple_BCD
7 3 13 apple_BCD
8 4 14 apple_BCD

Fake data:

namelist <- paste(fruit = rep(c("orange", "apple", "grape"), 2), 
                  nums =  rep(c("_ABC", "_BCD"), each =  3), sep = "")

fruitlist <- llply(namelist, function(x){
  assign(as.character(x), data.frame(a = 1:4, b = 11:14))
})

EDIT:

From the edits to your question above:

If you have the fruits and suffixes, use expand.grid to get all possible combinations (assuming that all combinations will refer to the dynamically generated data frames).

fruits <- c("orange", "apple", "grape")
suffixes <- c("_ABC", "_BCD")
fullnames <- apply(expand.grid(fruits, suffixes), 1, paste, collapse = "")

Using that list of names, use mget to generate a list of the present dataframes.

new_fruit_df_list <- mget(fullnames)

Then, the code from above should work, modified here to reflect the name changes:

fruit.groups <- split(names(new_fruit_df_list),
                      sapply(strsplit(names(new_fruit_df_list), split = "_"), "[[", 1))

fdf <- lapply(fruit.groups, function(x){
  out <- do.call(rbind, new_fruit_df_list[x])
  out$from <- gsub("(\\..*)", "", rownames(out))
  rownames(out) <- NULL
  return(out)
})

Have a look at the head of each, with the added column (remove if you don't want it) showing the name of that row's original data frame.

> lapply(fdf, head, 2)
$apple
  a  b      from
1 1 11 apple_ABC
2 2 12 apple_ABC

$grape
  a  b      from
1 1 11 grape_ABC
2 2 12 grape_ABC

$orange
  a  b       from
1 1 11 orange_ABC
2 2 12 orange_ABC

Upvotes: 1

omahdi
omahdi

Reputation: 630

As suspected, the proposed way of assigning values to objects does not work. Moreover, care has to be taken when using ls() and mget() for listing and accessing named objects within a function, because they do not automatically ascend to parent environments and only "see" variables in the local scope unless told otherwise. This applies to R version 3.4, older versions may behave differently.

  1. Creating named objects.

    In order to create new objects in the global environment, use assign() (already suggested in Luke C's answer):

    > assign("foo", "some text")
    > foo
    [1] "some text"
    

    Placing code inside a function induces a local scope. Explicitly specifying the global environment allows setting global variables:

    > set_foo <- function (x) { assign("foo", x, envir=globalenv()) }
    > set_foo("other text")
    > foo
    [1] "other text"
    

    Note that omitting the envir argument would leave the global environment unaffected.

  2. Use of ls()/mget() within a local function.

    By default, this only lists names from the current (local) environment of the that function, which only sees the argument x in the example code given in the question. Similar to above, a quick fix is to specify the global environment explicitly by adding the argument envir=globalenv(). The same applies for mget().

Since no MWE was provided, I am taking the liberty of adapting the "fake data" example code provided in Luke C's answer.

# Populate environment
namelist <- paste(fruit = rep(c("orange", "apple", "grape"), 2), 
                  nums =  rep(c("_ABC", "_BCD"), each =  3), sep = "")
for(x in namelist)
  assign(x, data.frame(a = 1:4, b = 11:14))

# The following re-generates the list of fruits used above
grouplist <- unique(unlist(lapply(strsplit(namelist, "_"), function (x) { x[[1]] })))
# Group and rbind by prefix, suppressing output
invisible(lapply(grouplist,
       function(x) {
         grouped <- do.call(rbind,
           mget(ls(pattern=paste0(x,"_*"), envir=globalenv()),
             envir=globalenv()))
         assign(x, grouped, envir=globalenv())
}))

Upvotes: 1

AidanGawronski
AidanGawronski

Reputation: 2085

Give this a try:

file_groups <- ls()[grep(".*_.*", ls())]
file_groups <- gsub("(.*)_.*", "\\1", file_groups)
df_list <- lapply(file_groups, 
                  function(x){ do.call(rbind, mget(ls(pattern = paste0(x, "*"))))})

Upvotes: 0

Related Questions