Zach
Zach

Reputation: 1396

Assocating Excel file name with dplyr::bind_rows .id

I have about 50-60 Excel workbooks that are sitting in a directory. These Excel files are by and large all the same. There are two work sheets, one with instructions, another with the same tidy data across all sheets. Columns/vars A through J are the data I actually want to extract but I'm willing to read it all into a data.frame. The data go out as far as A through N cols.

I'm writing a script to extract all of the raw data and so far so good. Part of my script uses the standard approach of list.files to build a 'df' of file names. Then, I use 'lapply' to read all of the excel files into a list. Now here comes the rub.

I want to use dplyr::bind_rows (but am open to other suggestions) to then bind all of the rows together since they are the same and share the same headers. This works fantastic in my proof of concept. When I use .id argument of dplyr::bind_rows on it, I get 1 through j as the id var, which corresponds to the data.frame position within the list.

files.list <- list.files(pattern='*.xlsx') # list file names in directory
df.list <- lapply(files.list, read_excel) # read excel into a list of dfs
df <- bind_rows(df.list, .id = "id") # bind the rows of the dfs together

Is it possible to get the actual file name as opposed to the position in the list from where the data.frame was read? If so, how would I do that?

Upvotes: 5

Views: 2173

Answers (1)

akuiper
akuiper

Reputation: 214977

Try rename the list of data frames with the file names using setNames; From ?bind_rows: .id labels are taken from the named arguments to bind_rows(). When a list of data frames is supplied, the labels are taken from the names of the list. If no names are found a numeric sequence is used instead.

files.list <- list.files(pattern='*.xlsx')
df.list <- setNames(lapply(files.list, read_excel), files.list)
df <- bind_rows(df.list, .id = "id")

Upvotes: 10

Related Questions