Reputation: 35
I am new to R and have been struggling a lot with this problem. Tried to find a solution across places but couldn't.
I have a folder containing multiple csv files (about 158). Each csv has a column with date and time. I found out that the format of the date is not standard across csv files, which messes up my analyzes. Example:
>head(file1) # date format is in MONTH/day/year
DateTime Value
6/2/14 11:00:01 PM 24.111
6/3/14 1:30:01 AM 21.61
6/3/14 4:00:01 AM 20.609
>head(file2) # date format is in day/MONTH/year
DateTime Value
03/06/14 1:30:01 AM 21.588
03/06/14 4:00:01 AM 20.086
03/06/14 6:30:01 AM 18.584
I made the following loop to bind the files.
>files.names<-list.files(getwd(),pattern="*.csv")
>theList <- lapply(files.names,function(x){
> theData <- read.csv(x,skip=18) })
>theResult <- do.call(rbind,theList)
>head(theResult)
Date.Time Value
1 6/2/14 11:00:01 PM 24.111
2 6/3/14 1:30:01 AM 21.610
3 6/3/14 4:00:01 AM 20.609
4 6/3/14 6:30:01 AM 19.107
5 6/3/14 9:00:01 AM 19.608
6 6/3/14 11:30:01 AM 20.609
What I think: I am guessing that there must be a way to standardize the format of the Date.Time
column in the loop of each csv before binding them. That is, I think I have to do that before I do.call(rbind,theList)
, but not sure how (or if it is possible).
Formatting each csv file in Excel would be a pain in the ass, so I would appreciate some help :P .
Upvotes: 0
Views: 1549
Reputation: 35
Could get it right by adding an extra line of code in after read.csv
, using parse_date_time
function in lubridate
package, as recommended.
>files.names<-list.files(getwd(),pattern="*.csv")
>theList <- lapply(files.names,function(x){
>theData <- read.csv(x,skip=18)
>theData$Date.Time<-parse_date_time(x = theData$Date.Time,
orders = c("mdy HMS %p", "dmy HMS %p"), local = eng") }) ###extra line
>theResult <- do.call(rbind,theList)
My guess is that R could figure out when to it should have been mdy
or dmy
based on the orders
argument in parse_date_time
.
Upvotes: 2
Reputation: 263332
Need to use proper format strings from the ?strptime
help page:
file1$DateTime <- as.POSIXct( file1$DateTime , format="%m/%d/%y %I:%M:%S %p")
file2$DateTime <- as.POSIXct( file1$DateTime , format="%d/%m/%y %I:%M:%S %p")
The defatted format for date-time input is YYYY-MM-DD HH:MM:SS in 24 hour time. If you have an AM/PM indicator, then you need to use the %I 12 hour format spec.
> file1
DateTime Value
1 2014-06-02 23:00:01 24.111
2 2014-06-03 01:30:01 21.610
3 2014-06-03 04:00:01 20.609
> file2
DateTime Value
1 2014-06-02 23:00:01 21.588
2 2014-06-03 01:30:01 20.086
3 2014-06-03 04:00:01 18.584
Then use do.call(rbind, list(file1,file2))
to get:
DateTime Value
1 2014-06-02 23:00:01 24.111
2 2014-06-03 01:30:01 21.610
3 2014-06-03 04:00:01 20.609
4 2014-06-02 23:00:01 21.588
5 2014-06-03 01:30:01 20.086
6 2014-06-03 04:00:01 18.584
Upvotes: 1