bshelt141
bshelt141

Reputation: 1223

Create a new field based on the file name in R

I have a number of .csv files that all contain the same fields that are housed in the same directory, but the values in each file are for a specific date. However, the data in the .csv files does not contain the date - only the file names contain the date.

I'm currently reading in and merging the different .csv files using the following code:

group <- list.files(pattern="[.]csv")
fun <- function(x) {
  df <- read.csv(x , header = TRUE)
}
big_df <- ldply(group, fun)

The output then looks something like this:

print(big_df)
  value_a value_b value_c
1       1     444      10
2       5    7364       9
3      32     573      14
4       7     253      16

and the files that it is reading from look like this:

print(group)
[1] "PSM_filter_report_04242015_1.csv" "PSM_filter_report_04242015_2.csv" "PSM_filter_report_04252015_1.csv"
[4] "PSM_filter_report_04272015_1.csv"

What I'd like to do as part of the process is use substr(), or some comparable function, to add the date from the file name as a new field in the file that I'm creating. By doing this, the new data frame would look like this:

big_df
  value_a value_b value_c     date
1       1     444      10 04242015
2       5    7364       9 04242015
3      32     573      14 04252015
4       7     253      16 04272015

So long as the date is read in as a factor or character (so that I don't lose the leading zero), I'll be comfortable converting the dates with as.Date().

Upvotes: 0

Views: 144

Answers (2)

Frash
Frash

Reputation: 728

You can do the same in base R, without using external library. Since the dates are of length 8 and formatted as '%m%d%Y', the following code would be fine.

# assuming only csv files are of interest
group <- list.files(path = '.', pattern=".csv$")
dates <- substr(group, regexpr('\\d', group), regexpr('\\d', group)+8-1)
dates <- strptime(dates, '%m%d%Y')
dates <- as.POSIXct(dates) #if you need POSIXct objects

You can replace the regexpr calls by direct from and to location, i.e. from=19, to=26

HTH

Upvotes: 0

jazzurro
jazzurro

Reputation: 23574

This would be one of the ways to get what you want. Since you have a systematic way to represent time information (i.e., there are eight digits for date information), you could do something like this.

library(stringr)

group <- c("PSM_filter_report_04242015_1.csv","PSM_filter_report_04242015_2.csv",
           "PSM_filter_report_04252015_1.csv", "PSM_filter_report_04272015_1.csv")

str_extract(group, "[0-9]{8}")

# [1] "04242015" "04242015" "04252015" "04272015"

Then, you can convert this character string to date later. Hope this helps you.

Upvotes: 1

Related Questions