James Martherus
James Martherus

Reputation: 1043

Using substring of file name to create new variable in a list of dataframes

I have a directory with a set of .rds files containing dataframes:

files <- c("file_2022-11-30.rds", "file_2022-12-01.rds")

I want to read each file into a list and then assign a new column to each dataframe in the list that contains a piece of name of the file it was loaded in from (the date). I know how to do this with a for loop, but I'm looking for a concise solution. I'm sure there's a way to do this with lapply, but this doesn't work:

library(dplyr)

df_list <- lapply(files, readRDS) %>%
  lapply(FUN = function(x) mutate(date = as.Date(stringr::str_sub(files[x], start = -14, end = -5)))) %>%
bind_rows()

Desired output would look something like this:

   var1       date
1     1 2022-11-30
2     2 2022-11-30
3     2 2022-11-30
4     1 2022-11-30
5     2 2022-11-30
6     2 2022-12-01
7     1 2022-12-01
8     2 2022-12-01
9     1 2022-12-01
10    2 2022-12-01

Upvotes: 1

Views: 29

Answers (1)

akrun
akrun

Reputation: 886948

We may use as.Date on the files and convert it to Date class. Then loop over the files, read with readRDS, cbind the 'dates' in Map and rbind the list elements

dates <-  as.Date(files, format = "file_%Y-%m-%d.rds")
do.call(rbind, Map(cbind, lapply(files, readRDS), dates = dates))

Or if we want to use tidyverse

library(purrr)
library(dplyr)
map2_dfr(files, dates, ~ readRDS(.x) %>%
          mutate(dates = .y))

In the OP's code, the files[x] wouldn't work because x is not an index, it is the list element i.e. the output from readRDS and there is no information about the files in the x. Instead, we can do this once within the single lapply

lapply(files, function(x)      
   readRDS(x) %>%
    mutate(date = as.Date(stringr::str_sub(x, start = -14, end = -5)))) %>%
   bind_rows

Upvotes: 1

Related Questions