Reputation: 1223
I have a number of .csv files that all contain the same fields that are housed in the same directory, but the values in each file are for a specific date. However, the data in the .csv files does not contain the date - only the file names contain the date.
I'm currently reading in and merging the different .csv files using the following code:
group <- list.files(pattern="[.]csv")
fun <- function(x) {
df <- read.csv(x , header = TRUE)
}
big_df <- ldply(group, fun)
The output then looks something like this:
print(big_df)
value_a value_b value_c
1 1 444 10
2 5 7364 9
3 32 573 14
4 7 253 16
and the files that it is reading from look like this:
print(group)
[1] "PSM_filter_report_04242015_1.csv" "PSM_filter_report_04242015_2.csv" "PSM_filter_report_04252015_1.csv"
[4] "PSM_filter_report_04272015_1.csv"
What I'd like to do as part of the process is use substr(), or some comparable function, to add the date from the file name as a new field in the file that I'm creating. By doing this, the new data frame would look like this:
big_df
value_a value_b value_c date
1 1 444 10 04242015
2 5 7364 9 04242015
3 32 573 14 04252015
4 7 253 16 04272015
So long as the date is read in as a factor or character (so that I don't lose the leading zero), I'll be comfortable converting the dates with as.Date().
Upvotes: 0
Views: 144
Reputation: 728
You can do the same in base R, without using external library. Since the dates are of length 8 and formatted as '%m%d%Y'
, the following code would be fine.
# assuming only csv files are of interest
group <- list.files(path = '.', pattern=".csv$")
dates <- substr(group, regexpr('\\d', group), regexpr('\\d', group)+8-1)
dates <- strptime(dates, '%m%d%Y')
dates <- as.POSIXct(dates) #if you need POSIXct objects
You can replace the regexpr
calls by direct from and to location, i.e. from=19, to=26
HTH
Upvotes: 0
Reputation: 23574
This would be one of the ways to get what you want. Since you have a systematic way to represent time information (i.e., there are eight digits for date information), you could do something like this.
library(stringr)
group <- c("PSM_filter_report_04242015_1.csv","PSM_filter_report_04242015_2.csv",
"PSM_filter_report_04252015_1.csv", "PSM_filter_report_04272015_1.csv")
str_extract(group, "[0-9]{8}")
# [1] "04242015" "04242015" "04252015" "04272015"
Then, you can convert this character string to date later. Hope this helps you.
Upvotes: 1