Reputation: 63
I have a csv file with a column containing dates, but the dates are in two different formats: "m/d/y H:M" and "y m d H:M:S". I want to make a new column with these dates all in one format (I don't care which one). I tried the parse_date_time function but it would only work for one of the formats, not both. How can I go about doing this?
Here is the code I was trying to use:
newdata <- mutate(data,
newcolumn = parse_date_time(x = data$date_column,
orders = c("m/d/y H:M", "y m d H:M:S"),
locale = "eng") )
Here are some example dates from the column:
x <- c("6/21/2006 0:00",
"1889-06-13 00:00:00",
"6/28/2012 0:00",
"5/19/2015 0:00",
"6/6/2016 0:00",
"1884-05-24 00:00:00",
"7/28/2013 0:00")
Upvotes: 4
Views: 1772
Reputation: 10432
Using lubridate::parse_date_time()
:
library(lubridate)
library(dplyr)
x <- c("6/21/2006 0:00",
"1889-06-13 00:00:00",
"6/28/2012 0:00",
"5/19/2015 0:00",
"6/6/2016 0:00",
"1884-05-24 00:00:00",
"7/28/2013 0:00")
df <- data_frame(date_column = x)
df_new <- df %>%
mutate(new_column = parse_date_time(date_column, orders = c('ymdHMS', "mdyHM")))
df_new
# A tibble: 7 x 2
date_column new_column
<chr> <dttm>
1 6/21/2006 0:00 2006-06-21 00:00:00
2 1889-06-13 00:00:00 1889-06-13 00:00:00
3 6/28/2012 0:00 2012-06-28 00:00:00
4 5/19/2015 0:00 2015-05-19 00:00:00
5 6/6/2016 0:00 2016-06-06 00:00:00
6 1884-05-24 00:00:00 1884-05-24 00:00:00
7 7/28/2013 0:00 2013-07-28 00:00:00
Upvotes: 3
Reputation: 368609
The anytime
package does just that -- heuristically evaluating plausible formats:
R> library(anytime)
R> x <- c("6/21/2006 0:00",
+ "1889-06-13 00:00:00",
+ "6/28/2012 0:00",
+ "5/19/2015 0:00",
+ "6/6/2016 0:00",
+ "1884-05-24 00:00:00",
+ "7/28/2013 0:00")
R> anytime(x)
[1] "2006-06-21 CDT" "1889-06-13 CST" "2012-06-28 CDT"
[4] "2015-05-19 CDT" NA "1884-05-24 CST"
[7] "2013-07-28 CDT"
R>
It uses Boost's date_time library parser by default, and that one does not do single digit month/day, hence the NA
on element six. But we also added R's parser as a fallback:
R> anytime(x, useR=TRUE)
[1] "2006-06-21 CDT" "1889-06-13 CST" "2012-06-28 CDT"
[4] "2015-05-19 CDT" "2016-06-06 CDT" "1884-05-24 CST"
[7] "2013-07-28 CDT"
R>
So here is all just works without a single format specification.
Upvotes: 3
Reputation: 7630
So we start by separating the two:
x <- c("03/20/2018 10:42", "2018-03-20 10:37:02")
DF <- data.frame(x = x, stringsAsFactors = FALSE)
slash_index <- grep("/", DF$x)
slash <- DF$x[slash_index]
dash <- DF$x[-slash_index]
Then we convert them. I like lubridate, but you can use your method if you'd like
library(lubridate)
slash <- mdy_hm(slash)
dash <- ymd_hms(dash)
Then we put them into a date vector:
date_times <- integer(0)
date_times[slash_index] <- slash
date_times[seq_along(DF$x)[-slash_index]] <- dash
DF$x <- as.POSIXct(date_times, origin = "1970-01-01 00:00:00")
DF
# x
# 1 2018-03-20 03:42:02
# 2 2018-03-20 03:37:02
Note:
The tricky part here was re-assigning parts of a vector to a vector according to their index. When a portion of a vector was assigned to a POSIXct
object, it had its attributes stripped, turning it into the internal integer code for the date time. This was resolved by stripping the attributes at the beginning, and then re-assigning the class at the end.
Here's the full thing with your example:
install.packages("lubridate")
library(lubridate)
x <- c("6/21/2006 0:00",
"1889-06-13 00:00:00",
"6/28/2012 0:00",
"5/19/2015 0:00",
"6/6/2016 0:00",
"1884-05-24 00:00:00",
"7/28/2013 0:00")
DF <- data.frame(x = x, stringsAsFactors = FALSE)
slash_index <- grep("/", DF$x)
slash <- DF$x[slash_index]
dash <- DF$x[-slash_index]
slash <- mdy_hm(slash)
dash <- ymd_hms(dash)
date_times <- integer(0)
date_times[slash_index] <- slash
date_times[seq_along(DF$x)[-slash_index]] <- dash
DF$x <- as.POSIXct(date_times, origin = "1970-01-01 00:00:00", tz = "UTC")
DF
# x
# 1 2006-06-21
# 2 1889-06-13
# 3 2012-06-28
# 4 2015-05-19
# 5 2016-06-06
# 6 1884-05-24
# 7 2013-07-28
Because the times for these are all "00:00:00"
, they've been truncated. You can display them with the "00:00:00"
using the method described in answers to this question.
Upvotes: 0