k3b
k3b

Reputation: 354

Return the names of data frames from a loop into a new data frame as IDs

Suppose I have multiple data frames with the same prefixes and same structure.

mydf_1 <- data.frame('fruit' = 'apples', 'n' = 2)
mydf_2 <- data.frame('fruit' = 'pears', 'n' = 0)
mydf_3 <- data.frame('fruit' = 'oranges', 'n' = 3)

I have a for-loop that grabs all the tables with this prefix, and appends those that match a certain condition.

res <- data.frame()

for(i in mget(apropos("^mydf_"), envir = .GlobalEnv)){
  
  if(sum(i$n) > 0){
    res <- rbind.data.frame(res, data.frame('name' = paste0(i[1]),
                                            'n' = sum(i$n)))
  }
}

res

This works fine, but I want my 'res' table to identify the name of the original data frame itself in the 'name' column, instead of the column name. My desired result is:

enter image description here

The closest I have gotten to solving this issue is:

'name' = paste0(substitute(i))

instead of

'name' = paste0(i[1])

but it just returns 'i'.

Any simple solution? Base preferred but not essential.

Upvotes: 1

Views: 283

Answers (2)

AndrewGB
AndrewGB

Reputation: 16856

As mentioned in the comments, it is better to put dataframes into a list as it much easier to handle and manipulate them. However, we could still grab the dataframes from the global environment, get the sum for each dataframe, then bind them together and add the dataframe name as a row.

library(tidyverse)

df_list <-
  do.call("list", mget(grep("^mydf_", names(.GlobalEnv), value = TRUE))) %>%
  map(., ~ .x %>% summarise(n = sum(n))) %>%
  discard(~ .x == 0) %>% 
  bind_rows(., .id = "name")

Or we could use map_dfr to bind together and summarise, then filter out the 0 values:

map_dfr(mget(ls(pattern = "^mydf_")), ~ c(n = sum(.x$n)), .id = "name") %>%
  filter(n != 0)

Output

    name n
1 mydf_1 2
2 mydf_3 3

Upvotes: 1

Darren Tsai
Darren Tsai

Reputation: 35554

To bind a list of data.frames and store the list names as a new column, a convenient way is to set the arg .id in dplyr::bind_rows().

library(dplyr)

mget(apropos("^mydf_")) %>%
  bind_rows(.id = "name") %>%
  count(name, wt = n) %>%
  filter(n > 0)

#     name n
# 1 mydf_1 2
# 2 mydf_3 3

Upvotes: 1

Related Questions