Kimberley
Kimberley

Reputation: 23

Combine multiple data frames with similar non-consecutive names

I have several data frames in R that have similar names: "datafile_20180801", "datafile_20180802",... --> they all start with 'datafile_' followed by the date of extraction. They have the same column names, how can I stack them into one data frame in R without having to type every single name into rbind? (I'm talking about 30+ data-frames)

Thanks in advance!

Upvotes: 1

Views: 608

Answers (1)

G. Grothendieck
G. Grothendieck

Reputation: 269501

The question did not provide reproducible data so we use the reproducible data in the Note at the end. The question states the column names are "similar and we have assumed that means that all data frames have the same column names.

Let env define the environment where the data frames are located. Below we assume that it is the global environment. (Note that the default for env in both ls and mget is the global environment so another possibility in that case is to just omit the env argument in both instances.) List the names in that environment that match the indicated regular expression giving Names. Then get the objects themselves using mget giving the list L and bind the components of L together using rbind. The row names of the result will indicate which data frame each row came from. No packages are used.

env <- .GlobalEnv
Names <- ls(pattern = "^datafile_\\d{8}$", env)
L <- mget(Names, env)
do.call("rbind", L)

giving the following using the data shown reproducibly in the Note at the end:

                    Time demand
datafile_20180801.1    1    8.3
datafile_20180801.2    2   10.3
datafile_20180801.3    3   19.0
datafile_20180801.4    4   16.0
datafile_20180801.5    5   15.6
datafile_20180801.6    7   19.8
datafile_20180802.1   10   83.0
datafile_20180802.2   20  103.0
datafile_20180802.3   30  190.0
datafile_20180802.4   40  160.0
datafile_20180802.5   50  156.0
datafile_20180802.6   70  198.0

An alternative to the last line of code above is to use bind_rows from dplyr. Instead of creating row names that identify the source of each row it will create a new id column to do that.

library(dplyr)
bind_rows(L, .id = "id")

giving the following but with warnings (which could be avoided if we convert the factor columns to character first):

                  id Time demand
1  datafile_20180801    1    8.3
2  datafile_20180801    2   10.3
3  datafile_20180801    3   19.0
4  datafile_20180801    4   16.0
5  datafile_20180801    5   15.6
6  datafile_20180801    7   19.8
7  datafile_20180802   10   83.0
8  datafile_20180802   20  103.0
9  datafile_20180802   30  190.0
10 datafile_20180802   40  160.0
11 datafile_20180802   50  156.0
12 datafile_20180802   70  198.0

We could express the above using magrittr if desired:

library(magrittr)

.GlobalEnv %>%
   mget(ls(pattern = "^datafile_\\d{8}$", .), .) %>%
   do.call("rbind", .)

Note

We have made use of the builtin data frame BOD to create reproducible input.

datafile_20180801 <- transform(BOD, Time = factor(Time))
datafile_20180802 <- transform(BOD, Time = factor(10*Time), demand = 10 * demand)

Upvotes: 1

Related Questions