Reputation: 25
In my dataframe, I have dates of the form
YYYY-MM-DD
YYYY.MM.DD
YYYY-MM-DD HH:MM
I want to standardise it into the form:
YYYY-MM-DD
in R.
I've tried the parse_date_time() function in R, but not all the columns are parsed. Why is that so? Any help would be appreciated :-)
Edit
An example for the usage of
parse_date_time(emails$mail_sent_date)
is this
[1] NA "2010-04-17 UTC" "2012-01-26 UTC" "2014-11-15 UTC" "2014-07-17 UTC" "2010-02-22 UTC" "2010-07-17 UTC" "2012-10-27 UTC" "2014-01-18 UTC"
[10] "2010-01-11 UTC" NA "2010-11-28 UTC" "2012-09-24 UTC" "2014-05-30 UTC" "2014-05-30 UTC" "2010-07-31 UTC" "2007-07-28 UTC" NA
[19] "2014-08-29 UTC" "2015-06-05 UTC" "2008-11-03 UTC" "2018-03-18 UTC" "2019-01-12 UTC" "2011-07-23 UTC" NA "2007-11-19 UTC" "2019-04-07 UTC"
[28] "2010-11-28 UTC" "2019-11-22 UTC" "2019-03-28 UTC" "2013-06-22 UTC" "2013-12-08 UTC" "2012-06-08 UTC" "2011-12-09 UTC" "2017-10-23 UTC" "2017-03-26 UTC"
[37] "2019-01-31 UTC" "2020-03-14 UTC" "2014-05-30 UTC" "2011-12-31 UTC" "2015-05-14 UTC" "2010-03-27 UTC" "2014-12-08 UTC" "2015-05-24 UTC" "2014-11-15 UTC"
[46] NA "2018-05-26 UTC" "2019-02-28 UTC" NA "2015-06-11 UTC" "2012-06-09 UTC" "2013-06-16 UTC" NA "2014-07-12 UTC"
[55] "2012-09-20 UTC" "2010-05-22 UTC" "2019-11-07 UTC" "2011-03-07 UTC" "2007-10-05 UTC" "2018-03-17 UTC" "2007-06-22 UTC" "2007-02-01 UTC" "2020-03-29 UTC"
[64] "2010-03-21 UTC" "2019-02-28 UTC" NA "2008-03-17 UTC" "2013-03-14 UTC" "2014-05-12 UTC" "2015-12-19 UTC" "2010-04-05 UTC" NA
[73] "2008-02-07 UTC" "2007-08-12 UTC" "2011-12-02 UTC" "2014-02-02 UTC" "2011-07-25 UTC" "2014-06-12 UTC" NA NA "2013-10-06 UTC"
[82] "2019-05-18 UTC" "2011-12-19 UTC" NA "2012-03-18 UTC" "2013-07-22 UTC" "2017-01-21 UTC" "2013-09-26 UTC" "2019-04-18 UTC" "2012-10-01 UTC"
[91] "2018-09-01 UTC" "2019-11-22 UTC" "2013-07-05 UTC" "2013-07-22 UTC" "2008-10-11 UTC" "2018-04-29 UTC" NA "2019-06-24 UTC" "2018-04-19 UTC"
[100] "2015-08-21 UTC" NA NA "2015-04-09 UTC" "2012-02-11 UTC" "2011-11-13 UTC" "2013-04-11 UTC" "2007-10-07 UTC" "2007-10-08 UTC"
[109] "2012-01-14 UTC" "2012-06-02 UTC" "2011-07-04 UTC" "2019-05-17 UTC" "2012-09-09 UTC" NA "2018-09-29 UTC" "2015-06-04 UTC" "2014-01-13 UTC"
[118] "2014-01-13 UTC" "2012-09-24 UTC" "2018-05-28 UTC" "2018-07-21 UTC" "2010-04-26 UTC" "2011-02-20 UTC" "2013-06-21 UTC" "2008-12-14 UTC" "2011-04-25 UTC"
[127] "2014-07-31 UTC" "2015-06-08 UTC" "2015-10-25 UTC" "2019-06-29 UTC" "2011-02-21 UTC" "2017-01-09 UTC" NA "2015-06-21 UTC" "2014-07-28 UTC"
[136] "2013-11-04 UTC" "2014-07-24 UTC" NA "2019-09-13 UTC" "2007-06-09 UTC" "2014-12-13 UTC" "2015-10-16 UTC" "2010-06-19 UTC" "2015-05-14 UTC"
[145] "2011-07-29 UTC" "2007-10-01 UTC" NA NA "2010-09-25 UTC" "2010-04-15 UTC" "2020-03-05 UTC" "2017-06-30 UTC" NA
[154] "2019-06-10 UTC" "2018-10-04 UTC" "2015-05-11 UTC" "2010-05-22 UTC" "2014-07-26 UTC" "2015-01-25 UTC" "2015-07-04 UTC" "2015-07-04 UTC" "2014-07-17 UTC"
[163] "2010-09-18 UTC" "2007-01-08 UTC" "2019-10-21 UTC" "2014-06-30 UTC" "2008-08-01 UTC" NA "2010-08-13 UTC" NA NA
[172] "2012-11-24 UTC" "2014-11-20 UTC" "2018-05-14 UTC" "2015-10-05 UTC" "2020-01-26 UTC" "2018-04-21 UTC" "2011-07-04 UTC" "2015-02-22 UTC" "2015-02-22 UTC"
[181] "2008-10-11 UTC" "2017-01-05 UTC" "2011-05-21 UTC" NA "2015-09-27 UTC" "2011-08-28 UTC" "2019-03-09 UTC" "2018-11-29 UTC" "2014-07-11 UTC"
[190] "2013-06-14 UTC" "2018-06-04 UTC" "2014-11-03 UTC" "2019-03-01 UTC" "2007-10-12 UTC" "2018-01-06 UTC" NA "2010-11-28 UTC" "2017-10-23 UTC"
[199] "2014-03-23 UTC" "2018-11-11 UTC" "2019-05-18 UTC" "2014-10-02 UTC" NA NA "2011-07-31 UTC" "2010-07-16 UTC" "2015-04-09 UTC"
[208] "2015-10-01 UTC" "2015-10-09 UTC" "2011-04-01 UTC" "2018-11-11 UTC" "2018-11-11 UTC" "2011-08-28 UTC" "2018-07-21 UTC" NA "2011-02-21 UTC"
[217] "2018-03-17 UTC" NA "2014-05-11 UTC" "2012-03-23 UTC" "2014-05-25 UTC" "2014-03-23 UTC" "2013-01-20 UTC" NA "2014-07-11 UTC"
[226] "2014-09-08 UTC" "2013-05-24 UTC" NA "2010-07-17 UTC" NA "2019-01-01 UTC" NA "2013-06-15 UTC" "2019-01-19 UTC"
[235] "2020-02-02 UTC" "2013-03-14 UTC" "2012-08-04 UTC" "2015-02-13 UTC" "2010-06-18 UTC" NA "2013-10-20 UTC" "2015-12-17 UTC" "2017-09-01 UTC"
[244] "2013-03-28 UTC" "2010-04-01 UTC" "2017-07-24 UTC" "2007-09-30 UTC" "2017-05-27 UTC" NA "2006-11-17 UTC" "2007-11-18 UTC" "2019-12-01 UTC"
[253] "2015-10-12 UTC" "2015-03-27 UTC" "2017-12-02 UTC" "2018-09-03 UTC" "2018-03-04 UTC" "2015-03-14 UTC" NA "2010-01-25 UTC" "2008-07-04 UTC"
[262] "2015-04-29 UTC" "2013-04-05 UTC" NA "2007-11-02 UTC" "2010-06-13 UTC" "2019-02-16 UTC" "2015-04-09 UTC" "2013-07-27 UTC" NA
[271] "2018-08-25 UTC" "2019-06-14 UTC"
Warning message:
39 failed to parse.
A similar warning message returned when I used ymd()
Upvotes: 0
Views: 59
Reputation: 269870
1) Assuming that the formats are precisely the ones shown in the question (if not please fix the question) then this uses only base R. This makes use of the fact that as.Date will ignore junk at the end.
x <- c("2000-10-01", "2000.10.01", "2000-10-01 03:04")
as.Date(chartr(".", "-", x))
## [1] "2000-10-01" "2000-10-01" "2000-10-01"
2) Another approach is the anytime package:
library(anytime)
anydate(x)
## [1] "2000-10-01" "2000-10-01" "2000-10-01"
Upvotes: 1
Reputation: 79164
Use lubridate
package and ymd
function
library(lubridate)
ymd(column_of_your_dataframe)
Upvotes: 0